Audio Module#

Additional dependencies

To use the soundevent.audio module you need to install some additional dependencies. Make sure you have them installed by running the following command:

pip install soundevent[audio]

`soundevent.audio` #

Soundevent functions for handling audio files and arrays.

Classes#

`MediaInfo(samplerate_hz, duration_s, samples, channels, format, subtype)` `dataclass` #

MediaInfo Class.

Encapsulates essential metadata about audio data for processing and analysis. The information stored in this dataclass can typically be automatically extracted from the audio file itself.

Attributes#

`channels: int` `instance-attribute` #

The number of audio channels present in the data.

For example 1 for mono, 2 for stereo.

`duration_s: float` `instance-attribute` #

The total duration of the audio, measured in seconds (s).

This represents the length of time the audio recording spans.

`format: str` `instance-attribute` #

A code representing the audio file format.

For example "WAV", "MP3".

`samplerate_hz: int` `instance-attribute` #

The sampling rate of the audio, measured in Hertz (Hz).

This indicates the number of samples taken per second to represent the analog audio signal.

`samples: int` `instance-attribute` #

The total number of samples in the audio data.

`subtype: str` `instance-attribute` #

A more specific subtype of the audio format.

For example "PCM_16", "A_LAW". These subtypes provide additional information about the audio data, such as the bit depth for PCM encoded audio, or encoding algorithm for compressed audio formats.

Functions#

`compute_md5_checksum(path)` #

Compute the MD5 checksum of a file.

Parameters:

Name	Type	Description	Default
`path`	`PathLike`	Path to the file.	required

Returns:

Type	Description
`str`	MD5 checksum of the file.

`compute_spectrogram(audio, window_size, hop_size, window_type='hann', detrend=False, padded=True, boundary='zeros')` #

Compute the spectrogram of a signal.

This function calculates the short-time Fourier transform (STFT), which decomposes a signal into overlapping windows and computes the Fourier transform of each window.

Parameters:

Name	Type	Description	Default
`audio`	`DataArray`	The audio signal.	required
`window_size`	`float`	The duration of the STFT window in seconds.	required
`hop_size`	`float`	The duration of the STFT hop (in seconds). This determines the time step between consecutive STFT frames.	required
`window_type`	`str`	The type of window to use. Refer to `scipy.signal.get_window` for supported types.	`'hann'`
`detrend`	`Union[str, Callable, Literal[False]]`	Specifies how to detrend each STFT window. See `scipy.signal.stft` for options. Default is False (no detrending).	`False`
`padded`	`bool`	Indicates whether the input signal is zero-padded at the beginning and end before performing the STFT. See `scipy.signal.stft`. Default is True.	`True`
`boundary`	`Optional[Literal['zeros', 'odd', 'even', 'constant']]`	Specifies the boundary extension mode for padding the signal to perform the STFT. See `scipy.signal.stft`. Default is "zeros".	`'zeros'`

Returns:

Name	Type	Description
`spectrogram`	`DataArray`	The spectrogram of the audio signal. This is a three-dimensional xarray data array with the dimensions frequency, time, and channel.

Notes

Time Bin Calculation: * The time axis of the spectrogram represents the center of each STFT window. * The first time bin is centered at time t=hop_size / 2. * Subsequent time bins are spaced by hop_size.

`filter(audio, low_freq=None, high_freq=None, order=5, dim=Dimensions.time.value)` #

Filter audio data.

This function assumes that the input audio object is a :class:xarray.DataArray with a "samplerate" attribute and a "time" dimension.

The filtering is done using a Butterworth filter or the specified order. The type of filter (lowpass/highpass/bandpass filter) is determined by the specified cutoff frequencies. If only one cutoff frequency is specified, a low pass or high pass filter is used. If both cutoff frequencies are specified, a band pass filter is used.

Parameters:

Name	Type	Description	Default
`audio`	`DataArray`	The audio data to filter with a "samplerate" attribute and a "time" dimension.	required
`low_freq`	`float`	The low cutoff frequency in Hz.	`None`
`high_freq`	`float`	The high cutoff frequency in Hz.	`None`
`order`	`int`	The order of the filter. By default, 5.	`5`
`dim`	`str`	The dimension along which to filter the audio data. By default, "time".	`time.value`

Returns:

Type	Description
`DataArray`	The filtered audio data.

Raises:

Type	Description
`ValueError`	If neither low_freq nor high_freq is specified, or if both are specified and low_freq > high_freq.

`get_audio_files(path, strict=False, recursive=True, follow_symlinks=False)` #

Return a generator of audio files in a directory.

Parameters:

Name	Type	Description	Default
`path`	`PathLike`	Path to the directory.	required
`strict`	`bool`	Whether to check the file contents to ensure it is an audio file. Will take a bit longer to run, by default False.	`False`
`recursive`	`bool`	Whether to search the directory recursively, by default True. This means that all audio files in subdirectories will be included. If False, only the audio files at the top level of the directory will be included.	`True`
`follow_symlinks`	`bool`	Whether to follow symbolic links, by default False. Care should be taken when following symbolic links to avoid infinite loops.	`False`

Yields:

Type	Description
`Path`	Path to the audio file.

Raises:

Type	Description
`ValueError`	If the path is not a directory.

Notes

This function uses the is_audio_file function to check if a file is an audio file. See the documentation for is_audio_file for more information on which audio file formats are supported.

Examples:

>>> from soundevent.audio.files import get_audio_files

Get all audio files in a directory recursively:

>>> for file in get_audio_files("path/to/directory"):
...     print(file)

Get all audio files in a directory without recursion:

>>> for file in get_audio_files(
...     "path/to/directory", recursive=False
... ):
...     print(file)

`get_media_info(path)` #

Return the media information from the WAV file.

The information extracted from the WAV file is the audio format, the bit depth, the sample rate, the duration, the number of samples, and the number of channels.

Parameters:

Name	Type	Description	Default
`path`	`PathLike`	Path to the WAV file.	required

Returns:

Name	Type	Description
`media_info`	`MediaInfo`	Information about the WAV file.

Raises:

Type	Description
`ValueError`	If the WAV file is not PCM encoded.

`is_audio_file(path, strict=False)` #

Return whether the file is an audio file.

Parameters:

Name	Type	Description	Default
`path`	`PathLike`	Path to the file.	required
`strict`	`bool`	Whether to check the file contents to ensure it is an audio file. Will take a bit longer to run, by default False.	`False`

Returns:

Type	Description
`bool`	Whether the file is an audio file.

Notes

The list of supported audio file extensions contains most of the audio files formats supported by the libsndfile library. See: https://libsndfile.github.io/libsndfile/

Some formats were excluded as they do not support seeking and thus are not suitable for random access.

Supported formats:

aiff
au
avr
caf
flac
htk
ircam
mat4
mat5
mp3
mpc2k
nist
ogg
paf
pvf
rf64
sds
svx
voc
w64
wav
wavex
wve

`load_audio(path, offset=0, samples=None)` #

Load an audio file.

Parameters:

Name	Type	Description	Default
`path`	`PathLike`	The path to the audio file.	required
`offset`	`int`	The offset in samples from the start of the audio file.	`0`
`samples`	`Optional[int]`	The number of samples to load. If None, load the entire file.	`None`

Returns:

Name	Type	Description
	`ndarray`	The audio data.
`samplerate`	`int`	The sample rate of the audio file in Hz.

`load_clip(clip, audio_dir=None)` #

Load a clip from a file.

Parameters:

Name	Type	Description	Default
`clip`	`Clip`	The clip to load.	required
`audio_dir`	`Optional[PathLike]`	The directory containing the audio file. If None, the recording path is assumed to be relative to the current working directory or an absolute path.	`None`

Returns:

Name	Type	Description
`audio`	`DataArray`	The loaded clip. The returned clip stores the samplerate and time expansion of the recording from which it was extracted.

`load_recording(recording, audio_dir=None)` #

Load a recording from a file.

Parameters:

Name	Type	Description	Default
`recording`	`Recording`	The recording to load.	required
`audio_dir`	`Optional[PathLike]`	The directory containing the audio file. If None, the recording path is assumed to be relative to the current working directory or an absolute path.	`None`

Returns:

Name	Type	Description
`audio`	`DataArray`	The loaded recording.

`pcen(array, smooth=0.025, gain=0.98, bias=2, power=0.5, eps=1e-06, dim=Dimensions.time.value)` #

Apply PCEN to spectrogram.

Parameters:

Name	Type	Description	Default
`array`	`DataArray`	The spectrogram to which to apply PCEN.	required
`smooth`	`float`	The time constant for smoothing the input spectrogram. By default, 0.025.	`0.025`
`gain`	`float`	The gain factor for the PCEN transform. By default, 0.98.	`0.98`
`bias`	`float`	The bias factor for the PCEN transform. By default, 2.	`2`
`power`	`float`	The power factor for the PCEN transform. By default, 0.5.	`0.5`
`eps`	`float`	An epsilon value to prevent division by zero. By default, 1e-6.	`1e-06`
`dim`	`str`	The dimension along which to apply PCEN.	`time.value`

Returns:

Type	Description
`DataArray`	Spectrogram with PCEN applied.

Notes

This function applies the Per-Channel Energy Normalization (PCEN) transform to a spectrogram, as described in [1].

The PCEN transform is defined as:

\[ PCEN(X) = \left(\frac{X}{(\epsilon + S)^{\alpha}} + \delta\right)^r - \delta^r \]

where \(X\) is the input spectrogram, \(S\) is the smoothed version of the input spectrogram, \(\alpha\) is the power factor, \(\delta\) is the bias factor, and \(r\) is the gain factor.

The smoothed version of the input spectrogram is computed using a first-order IIR filter:

\[ S_t = (1 - \beta) S_{t-1} + \beta X_t \]

where \(\beta\) is the smoothing factor.

The default values for the parameters are taken from the PCEN paper [1].

References

[1] Wang, Y., Getreuer, P., Hughes, T., Lyon, R. F., & Saurous, R. A. (2017, March). Trainable frontend for robust and far-field keyword spotting. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5670-5674). IEEE.

`resample(array, target_samplerate, window=None, dim=Dimensions.time.value)` #

Resample array data to a target sample rate along a given dimension.

Parameters:

Name	Type	Description	Default
`array`	`DataArray`	The data array to resample.	required
`target_samplerate`	`int`	The target sample rate of the resampled data in Hz.	required
`window`	`Optional[str]`	The window to use for resampling. See scipy.signal.resample for details.	`None`
`dim`	`str`	The dimension along which to resample the audio data. By default, "time".	`time.value`

Returns:

Type	Description
`DataArray`	The resampled audio data.

Notes

This function uses scipy.signal.resample to resample the input data array to the target sample rate. This function uses the Fourier method to resample the data, which is suitable for resampling audio data. For other resampling methods, consider using the xarray.DataArray.interp method.

Raises:

Type	Description
`ValueError`	If the input audio object is not a :class:`xarray.DataArray`, or if it does not have a "samplerate" attribute, or if it does not have a "time" dimension.

Audio Module#

soundevent.audio #

Classes#

MediaInfo(samplerate_hz, duration_s, samples, channels, format, subtype) dataclass #

Attributes#

channels: int instance-attribute #

duration_s: float instance-attribute #

format: str instance-attribute #

samplerate_hz: int instance-attribute #

samples: int instance-attribute #

subtype: str instance-attribute #

Functions#

compute_md5_checksum(path) #

compute_spectrogram(audio, window_size, hop_size, window_type='hann', detrend=False, padded=True, boundary='zeros') #

filter(audio, low_freq=None, high_freq=None, order=5, dim=Dimensions.time.value) #

get_audio_files(path, strict=False, recursive=True, follow_symlinks=False) #

get_media_info(path) #

is_audio_file(path, strict=False) #

load_audio(path, offset=0, samples=None) #

load_clip(clip, audio_dir=None) #

load_recording(recording, audio_dir=None) #

pcen(array, smooth=0.025, gain=0.98, bias=2, power=0.5, eps=1e-06, dim=Dimensions.time.value) #

resample(array, target_samplerate, window=None, dim=Dimensions.time.value) #

`soundevent.audio` #

`MediaInfo(samplerate_hz, duration_s, samples, channels, format, subtype)` `dataclass` #

`channels: int` `instance-attribute` #

`duration_s: float` `instance-attribute` #

`format: str` `instance-attribute` #

`samplerate_hz: int` `instance-attribute` #

`samples: int` `instance-attribute` #

`subtype: str` `instance-attribute` #

`compute_md5_checksum(path)` #

`compute_spectrogram(audio, window_size, hop_size, window_type='hann', detrend=False, padded=True, boundary='zeros')` #

`filter(audio, low_freq=None, high_freq=None, order=5, dim=Dimensions.time.value)` #

`get_audio_files(path, strict=False, recursive=True, follow_symlinks=False)` #

`get_media_info(path)` #

`is_audio_file(path, strict=False)` #

`load_audio(path, offset=0, samples=None)` #

`load_clip(clip, audio_dir=None)` #

`load_recording(recording, audio_dir=None)` #

`pcen(array, smooth=0.025, gain=0.98, bias=2, power=0.5, eps=1e-06, dim=Dimensions.time.value)` #

`resample(array, target_samplerate, window=None, dim=Dimensions.time.value)` #