Skip to content

Audio Preprocessing#

audioclass.preprocess #

Module for preprocessing audio data.

This module helps loading audio data and preprocessing it into a standardized format for audio classification models.

Provides functions for loading audio, resampling, and framing into fixed-length buffers.

Functions:

Name Description
load_clip

Load an audio clip from a soundevent Clip object.

load_recording

Load an audio recording from a soundevent Recording object.

Functions#

load_clip(clip, samplerate, buffer_size, audio_dir=None) #

Load an audio clip from a soundevent Clip object.

This function will load the clip from the audio file, preprocess it, and return a numpy array.

Parameters:

Name Type Description Default
clip Clip

The soundevent Clip object representing the audio segment.

required
samplerate int

The desired sample rate to resample the audio to.

required
buffer_size int

The length of each audio frame in samples.

required
audio_dir Optional[Path]

The directory containing the audio files. If not provided, the clip's default audio directory is used.

None

Returns:

Type Description
ndarray

A numpy array of shape (num_frames, buffer_size) containing the preprocessed audio data.

load_recording(recording, samplerate, buffer_size, audio_dir=None) #

Load an audio recording from a soundevent Recording object.

This function will load the audio file, preprocess it, and return a numpy array.

Parameters:

Name Type Description Default
recording Recording

The soundevent Recording object representing the audio file.

required
samplerate int

The desired sample rate to resample the audio to.

required
buffer_size int

The length of each audio frame in samples.

required
audio_dir Optional[Path]

The directory containing the audio files. If not provided, the recording's default audio directory is used.

None

Returns:

Type Description
ndarray

A numpy array of shape (num_frames, buffer_size) containing the preprocessed audio data.

preprocess_audio(wave, samplerate, buffer_size) #

Preprocess a loaded audio waveform.

This function performs the following preprocessing steps:

  1. Selects the first channel if multiple channels are present.
  2. Resamples the audio to the specified sample rate.
  3. Frames the audio into fixed-length buffers.

Parameters:

Name Type Description Default
wave DataArray

The loaded audio waveform.

required
samplerate int

The desired sample rate to resample the audio to.

required
buffer_size int

The length of each audio frame in samples.

required

Returns:

Type Description
ndarray

A numpy array of shape (num_frames, buffer_size) containing the preprocessed audio data.

resample_audio(wave, samplerate) #

Resample audio to a specific sample rate.

Parameters:

Name Type Description Default
wave DataArray

The audio waveform to resample.

required
samplerate int

The target sample rate.

required

Returns:

Type Description
DataArray

The resampled audio waveform.

stack_array(arr, buffer_size) #

Stack a 1D array into a 2D array of fixed-length buffers.

This function pads the input array with zeros if necessary to ensure that the number of elements is divisible by the buffer size.

Parameters:

Name Type Description Default
arr ndarray

The 1D array to stack.

required
buffer_size int

The length of each buffer.

required

Returns:

Type Description
ndarray

A 2D array of shape (num_buffers, buffer_size) containing the stacked buffers.

Raises:

Type Description
ValueError

If the input array has more than one dimension.