Skip to content

base#

audioclass.models.base #

Module defining the base classes for audio classification models and their output format.

This module provides abstract classes for clip classification models, establishing a standard interface for model input, processing, and output. It also defines the structure of the model output, which includes class probabilities and extracted features.

Classes:

Name Description
ClipClassificationModel

Abstract base class for audio clip classification models.

ModelOutput

Output format for audio classification models.

Classes#

ClipClassificationModel #

Bases: ABC

Abstract base class for audio clip classification models.

This class defines the common interface for audio classification models that process individual clips. It provides methods for processing raw audio arrays, files, recordings, and clips, as well as an iterable of clips.

Methods:

Name Description
process_array

Process a single audio array and return the model output.

process_clip

Process an audio clip and return the model output.

process_file

Process an audio file and return the model output.

process_iterable

Process an iterable of audio clips and return a list of predictions.

process_recording

Process an audio recording and return the model output.

Attributes:

Name Type Description
batch_size int

The maximum number of framces to process in each batch.

confidence_threshold float

The minimum confidence threshold for a class to be considered.

input_samples int

The number of audio samples expected in each input frame.

name str

The name of the model.

num_classes int

The number of classes that the model can predict.

samplerate int

The sample rate of the audio data expected by the model (in Hz).

tags List[Tag]

The list of tags that the model can predict.

Attributes#
batch_size: int = 8 class-attribute instance-attribute #

The maximum number of framces to process in each batch.

confidence_threshold: float instance-attribute #

The minimum confidence threshold for a class to be considered.

input_samples: int instance-attribute #

The number of audio samples expected in each input frame.

name: str instance-attribute #

The name of the model.

num_classes: int instance-attribute #

The number of classes that the model can predict.

samplerate: int instance-attribute #

The sample rate of the audio data expected by the model (in Hz).

tags: List[data.Tag] instance-attribute #

The list of tags that the model can predict.

Functions#
process_array(array) abstractmethod #

Process a single audio array and return the model output.

Parameters:

Name Type Description Default
array ndarray

The audio array to be processed, with shape (num_frames, input_samples).

required

Returns:

Type Description
ModelOutput

A ModelOutput object containing the class probabilities and extracted features.

process_clip(clip, fmt='soundevent') #
process_clip(clip: data.Clip, fmt: Literal['soundevent'] = 'soundevent') -> List[data.ClipPrediction]
process_clip(clip: data.Clip, fmt: Literal['dataset'] = 'dataset') -> xr.Dataset

Process an audio clip and return the model output.

Parameters:

Name Type Description Default
clip Clip

The Clip object representing the audio segment.

required
fmt Literal['soundevent', 'dataset']

The desired output format. "soundevent" returns a list of ClipPrediction objects, while "dataset" returns an xarray Dataset. Defaults to "soundevent".

'soundevent'

Returns:

Type Description
Union[List[ClipPrediction], Dataset]

The model output in the specified format.

process_file(path, fmt='soundevent', **kwargs) #
process_file(path: Path, fmt: Literal['soundevent'] = 'soundevent', **kwargs) -> List[data.ClipPrediction]
process_file(path: Path, fmt: Literal['dataset'] = 'dataset', **kwargs) -> xr.Dataset

Process an audio file and return the model output.

Parameters:

Name Type Description Default
path Path

The path to the audio file.

required
fmt Literal['soundevent', 'dataset']

The desired output format. "soundevent" returns a list of ClipPrediction objects, while "dataset" returns an xarray Dataset. Defaults to "soundevent".

'soundevent'
**kwargs

Additional keyword arguments to pass to Recording.from_file().

{}

Returns:

Type Description
Union[List[ClipPrediction], Dataset]

The model output in the specified format.

process_iterable(iterable) #

Process an iterable of audio clips and return a list of predictions.

Parameters:

Name Type Description Default
iterable BaseIterator

An iterator that yields Clip objects.

required

Returns:

Type Description
List[ClipPrediction]

A list of ClipPrediction objects, one for each clip in the iterable.

process_recording(recording, fmt='soundevent') #
process_recording(recording: data.Recording, fmt: Literal['soundevent'] = 'soundevent') -> List[data.ClipPrediction]
process_recording(recording: data.Recording, fmt: Literal['dataset'] = 'dataset') -> xr.Dataset

Process an audio recording and return the model output.

Parameters:

Name Type Description Default
recording Recording

The Recording object representing the audio.

required
fmt Literal['soundevent', 'dataset']

The desired output format. "soundevent" returns a list of ClipPrediction objects, while "dataset" returns an xarray Dataset. Defaults to "soundevent".

'soundevent'

Returns:

Type Description
Union[List[ClipPrediction], Dataset]

The model output in the specified format.

ModelOutput #

Bases: NamedTuple

Output format for audio classification models.

Attributes:

Name Type Description
class_probs ndarray

Array of class probabilities for each frame.

features ndarray

Array of extracted features for each frame.

Attributes#
class_probs: np.ndarray instance-attribute #

Array of class probabilities for each frame.

The array has shape (num_frames, num_classes), where num_frames is the number of frames in the input audio clip and num_classes is the number of classes that the model can predict.

Notice that the interpretation may vary depending on the model and it is advisable to check the model's documentation for more information.

features: np.ndarray instance-attribute #

Array of extracted features for each frame.

The array has shape (num_frames, num_features), where num_frames is the number of frames in the input audio clip and num_features is the number of features extracted by the model.

The features can be used for further analysis or visualization of the model output.

Functions#