Audio Preprocessing#
audioclass.preprocess
#
Module for preprocessing audio data.
This module helps loading audio data and preprocessing it into a standardized format for audio classification models.
Provides functions for loading audio, resampling, and framing into fixed-length buffers.
Functions:
Name | Description |
---|---|
load_clip |
Load an audio clip from a soundevent |
load_recording |
Load an audio recording from a soundevent |
Functions#
load_clip(clip, samplerate, buffer_size, audio_dir=None)
#
Load an audio clip from a soundevent Clip
object.
This function will load the clip from the audio file, preprocess it, and return a numpy array.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
clip
|
Clip
|
The soundevent |
required |
samplerate
|
int
|
The desired sample rate to resample the audio to. |
required |
buffer_size
|
int
|
The length of each audio frame in samples. |
required |
audio_dir
|
Optional[Path]
|
The directory containing the audio files. If not provided, the clip's default audio directory is used. |
None
|
Returns:
Type | Description |
---|---|
ndarray
|
A numpy array of shape (num_frames, buffer_size) containing the preprocessed audio data. |
load_recording(recording, samplerate, buffer_size, audio_dir=None)
#
Load an audio recording from a soundevent Recording
object.
This function will load the audio file, preprocess it, and return a numpy array.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
recording
|
Recording
|
The soundevent |
required |
samplerate
|
int
|
The desired sample rate to resample the audio to. |
required |
buffer_size
|
int
|
The length of each audio frame in samples. |
required |
audio_dir
|
Optional[Path]
|
The directory containing the audio files. If not provided, the recording's default audio directory is used. |
None
|
Returns:
Type | Description |
---|---|
ndarray
|
A numpy array of shape (num_frames, buffer_size) containing the preprocessed audio data. |
preprocess_audio(wave, samplerate, buffer_size)
#
Preprocess a loaded audio waveform.
This function performs the following preprocessing steps:
- Selects the first channel if multiple channels are present.
- Resamples the audio to the specified sample rate.
- Frames the audio into fixed-length buffers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
wave
|
DataArray
|
The loaded audio waveform. |
required |
samplerate
|
int
|
The desired sample rate to resample the audio to. |
required |
buffer_size
|
int
|
The length of each audio frame in samples. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
A numpy array of shape (num_frames, buffer_size) containing the preprocessed audio data. |
resample_audio(wave, samplerate)
#
stack_array(arr, buffer_size)
#
Stack a 1D array into a 2D array of fixed-length buffers.
This function pads the input array with zeros if necessary to ensure that the number of elements is divisible by the buffer size.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arr
|
ndarray
|
The 1D array to stack. |
required |
buffer_size
|
int
|
The length of each buffer. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
A 2D array of shape (num_buffers, buffer_size) containing the stacked buffers. |
Raises:
Type | Description |
---|---|
ValueError
|
If the input array has more than one dimension. |