Audio Module#
Additional dependencies
To use the soundevent.audio
module you need to install some additional
dependencies. Make sure you have them installed by running the following
command:
soundevent.audio
#
Soundevent functions for handling audio files and arrays.
Modules:
Name | Description |
---|---|
attributes |
Definition of common attributes for audio objects. |
files |
Functions for recognizing audio files. |
io |
Input and output functions for soundevent. |
media_info |
Functions for getting media information from WAV files. |
operations |
|
spectrograms |
Functions to compute several spectral representations of sound signals. |
Classes:
Name | Description |
---|---|
MediaInfo |
MediaInfo Class. |
Functions:
Name | Description |
---|---|
compute_md5_checksum |
Compute the MD5 checksum of a file. |
compute_spectrogram |
Compute the spectrogram of a signal. |
filter |
Filter audio data. |
get_audio_files |
Return a generator of audio files in a directory. |
get_media_info |
Return the media information from the WAV file. |
is_audio_file |
Return whether the file is an audio file. |
load_audio |
Load an audio file. |
load_clip |
Load a clip from a file. |
load_recording |
Load a recording from a file. |
pcen |
Apply PCEN to spectrogram. |
resample |
Resample array data to a target sample rate along a given dimension. |
Classes#
MediaInfo(samplerate_hz, duration_s, samples, channels, format, subtype)
dataclass
#
MediaInfo Class.
Encapsulates essential metadata about audio data for processing and analysis. The information stored in this dataclass can typically be automatically extracted from the audio file itself.
Attributes:
Name | Type | Description |
---|---|---|
channels |
int
|
The number of audio channels present in the data. |
duration_s |
float
|
The total duration of the audio, measured in seconds (s). |
format |
str
|
A code representing the audio file format. |
samplerate_hz |
int
|
The sampling rate of the audio, measured in Hertz (Hz). |
samples |
int
|
The total number of samples in the audio data. |
subtype |
str
|
A more specific subtype of the audio format. |
Attributes#
channels: int
instance-attribute
#
The number of audio channels present in the data.
For example 1 for mono, 2 for stereo.
duration_s: float
instance-attribute
#
The total duration of the audio, measured in seconds (s).
This represents the length of time the audio recording spans.
format: str
instance-attribute
#
A code representing the audio file format.
For example "WAV", "MP3".
samplerate_hz: int
instance-attribute
#
The sampling rate of the audio, measured in Hertz (Hz).
This indicates the number of samples taken per second to represent the analog audio signal.
samples: int
instance-attribute
#
The total number of samples in the audio data.
subtype: str
instance-attribute
#
A more specific subtype of the audio format.
For example "PCM_16", "A_LAW". These subtypes provide additional information about the audio data, such as the bit depth for PCM encoded audio, or encoding algorithm for compressed audio formats.
Functions#
compute_md5_checksum(path)
#
compute_spectrogram(audio, window_size, hop_size, window_type='hann', detrend=False, padded=True, boundary='zeros')
#
Compute the spectrogram of a signal.
This function calculates the short-time Fourier transform (STFT), which decomposes a signal into overlapping windows and computes the Fourier transform of each window.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio
|
DataArray
|
The audio signal. |
required |
window_size
|
float
|
The duration of the STFT window in seconds. |
required |
hop_size
|
float
|
The duration of the STFT hop (in seconds). This determines the time step between consecutive STFT frames. |
required |
window_type
|
str
|
The type of window to use. Refer to
|
'hann'
|
detrend
|
Union[str, Callable, Literal[False]]
|
Specifies how to detrend each STFT window. See
|
False
|
padded
|
bool
|
Indicates whether the input signal is zero-padded at the beginning and
end before performing the STFT. See
|
True
|
boundary
|
Optional[Literal['zeros', 'odd', 'even', 'constant']]
|
Specifies the boundary extension mode for padding the signal to perform
the STFT. See
|
'zeros'
|
Returns:
Name | Type | Description |
---|---|---|
spectrogram |
DataArray
|
The spectrogram of the audio signal. This is a three-dimensional xarray data array with the dimensions frequency, time, and channel. |
Notes
Time Bin Calculation: * The time axis of the spectrogram represents the center of each STFT window. * The first time bin is centered at time t=hop_size / 2. * Subsequent time bins are spaced by hop_size.
filter(audio, low_freq=None, high_freq=None, order=5, dim=Dimensions.time.value)
#
Filter audio data.
This function assumes that the input audio object is a
:class:xarray.DataArray
with a "samplerate" attribute and a "time"
dimension.
The filtering is done using a Butterworth filter or the specified order. The type of filter (lowpass/highpass/bandpass filter) is determined by the specified cutoff frequencies. If only one cutoff frequency is specified, a low pass or high pass filter is used. If both cutoff frequencies are specified, a band pass filter is used.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio
|
DataArray
|
The audio data to filter with a "samplerate" attribute and a "time" dimension. |
required |
low_freq
|
float
|
The low cutoff frequency in Hz. |
None
|
high_freq
|
float
|
The high cutoff frequency in Hz. |
None
|
order
|
int
|
The order of the filter. By default, 5. |
5
|
dim
|
str
|
The dimension along which to filter the audio data. By default, "time". |
time.value
|
Returns:
Type | Description |
---|---|
DataArray
|
The filtered audio data. |
Raises:
Type | Description |
---|---|
ValueError
|
If neither low_freq nor high_freq is specified, or if both are specified and low_freq > high_freq. |
get_audio_files(path, strict=False, recursive=True, follow_symlinks=False)
#
Return a generator of audio files in a directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
PathLike
|
Path to the directory. |
required |
strict
|
bool
|
Whether to check the file contents to ensure it is an audio file. Will take a bit longer to run, by default False. |
False
|
recursive
|
bool
|
Whether to search the directory recursively, by default True. This means that all audio files in subdirectories will be included. If False, only the audio files at the top level of the directory will be included. |
True
|
follow_symlinks
|
bool
|
Whether to follow symbolic links, by default False. Care should be taken when following symbolic links to avoid infinite loops. |
False
|
Yields:
Type | Description |
---|---|
Path
|
Path to the audio file. |
Raises:
Type | Description |
---|---|
ValueError
|
If the path is not a directory. |
Notes
This function uses the
is_audio_file
function to check
if a file is an audio file. See the documentation for
is_audio_file
for more
information on which audio file formats are supported.
Examples:
Get all audio files in a directory recursively:
Get all audio files in a directory without recursion:
get_media_info(path)
#
Return the media information from the WAV file.
The information extracted from the WAV file is the audio format, the bit depth, the sample rate, the duration, the number of samples, and the number of channels.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
PathLike
|
Path to the WAV file. |
required |
Returns:
Name | Type | Description |
---|---|---|
media_info |
MediaInfo
|
Information about the WAV file. |
Raises:
Type | Description |
---|---|
ValueError
|
If the WAV file is not PCM encoded. |
is_audio_file(path, strict=False)
#
Return whether the file is an audio file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
PathLike
|
Path to the file. |
required |
strict
|
bool
|
Whether to check the file contents to ensure it is an audio file. Will take a bit longer to run, by default False. |
False
|
Returns:
Type | Description |
---|---|
bool
|
Whether the file is an audio file. |
Notes
The list of supported audio file extensions
contains most of the audio files formats supported by the libsndfile
library. See: https://libsndfile.github.io/libsndfile/
Some formats were excluded as they do not support seeking and thus are not suitable for random access.
Supported formats:
- aiff
- au
- avr
- caf
- flac
- htk
- ircam
- mat4
- mat5
- mp3
- mpc2k
- nist
- ogg
- paf
- pvf
- rf64
- sds
- svx
- voc
- w64
- wav
- wavex
- wve
load_audio(path, offset=0, samples=None)
#
Load an audio file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
PathLike
|
The path to the audio file. |
required |
offset
|
int
|
The offset in samples from the start of the audio file. |
0
|
samples
|
Optional[int]
|
The number of samples to load. If None, load the entire file. |
None
|
Returns:
Name | Type | Description |
---|---|---|
ndarray
|
The audio data. |
|
samplerate |
int
|
The sample rate of the audio file in Hz. |
load_clip(clip, audio_dir=None)
#
Load a clip from a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
clip
|
Clip
|
The clip to load. |
required |
audio_dir
|
Optional[PathLike]
|
The directory containing the audio file. If None, the recording path is assumed to be relative to the current working directory or an absolute path. |
None
|
Returns:
Name | Type | Description |
---|---|---|
audio |
DataArray
|
The loaded clip. The returned clip stores the samplerate and time expansion of the recording from which it was extracted. |
load_recording(recording, audio_dir=None)
#
Load a recording from a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
recording
|
Recording
|
The recording to load. |
required |
audio_dir
|
Optional[PathLike]
|
The directory containing the audio file. If None, the recording path is assumed to be relative to the current working directory or an absolute path. |
None
|
Returns:
Name | Type | Description |
---|---|---|
audio |
DataArray
|
The loaded recording. |
pcen(array, smooth=0.025, gain=0.98, bias=2, power=0.5, eps=1e-06, dim=Dimensions.time.value)
#
Apply PCEN to spectrogram.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
array
|
DataArray
|
The spectrogram to which to apply PCEN. |
required |
smooth
|
float
|
The time constant for smoothing the input spectrogram. By default, 0.025. |
0.025
|
gain
|
float
|
The gain factor for the PCEN transform. By default, 0.98. |
0.98
|
bias
|
float
|
The bias factor for the PCEN transform. By default, 2. |
2
|
power
|
float
|
The power factor for the PCEN transform. By default, 0.5. |
0.5
|
eps
|
float
|
An epsilon value to prevent division by zero. By default, 1e-6. |
1e-06
|
dim
|
str
|
The dimension along which to apply PCEN. |
time.value
|
Returns:
Type | Description |
---|---|
DataArray
|
Spectrogram with PCEN applied. |
Notes
This function applies the Per-Channel Energy Normalization (PCEN) transform to a spectrogram, as described in [1].
The PCEN transform is defined as:
where \(X\) is the input spectrogram, \(S\) is the smoothed version of the input spectrogram, \(\alpha\) is the power factor, \(\delta\) is the bias factor, and \(r\) is the gain factor.
The smoothed version of the input spectrogram is computed using a first-order IIR filter:
where \(\beta\) is the smoothing factor.
The default values for the parameters are taken from the PCEN paper [1].
References
[1] Wang, Y., Getreuer, P., Hughes, T., Lyon, R. F., & Saurous, R. A. (2017, March). Trainable frontend for robust and far-field keyword spotting. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5670-5674). IEEE.
resample(array, target_samplerate, window=None, dim=Dimensions.time.value)
#
Resample array data to a target sample rate along a given dimension.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
array
|
DataArray
|
The data array to resample. |
required |
target_samplerate
|
int
|
The target sample rate of the resampled data in Hz. |
required |
window
|
Optional[str]
|
The window to use for resampling. See scipy.signal.resample for details. |
None
|
dim
|
str
|
The dimension along which to resample the audio data. By default, "time". |
time.value
|
Returns:
Type | Description |
---|---|
DataArray
|
The resampled audio data. |
Notes
This function uses scipy.signal.resample to resample the input data array to the target sample rate. This function uses the Fourier method to resample the data, which is suitable for resampling audio data. For other resampling methods, consider using the xarray.DataArray.interp method.
Raises:
Type | Description |
---|---|
ValueError
|
If the input audio object is not a :class: |