Skip to content


audioclass.models.base #

Module defining the base classes for audio classification models and their output format.

This module provides abstract classes for clip classification models, establishing a standard interface for model input, processing, and output. It also defines the structure of the model output, which includes class probabilities and extracted features.


ClipClassificationModel #

Bases: ABC

Abstract base class for audio clip classification models.

This class defines the common interface for audio classification models that process individual clips. It provides methods for processing raw audio arrays, files, recordings, and clips, as well as an iterable of clips.

confidence_threshold: float instance-attribute #

The minimum confidence threshold for a class to be considered.

input_samples: int instance-attribute #

The number of audio samples expected in each input frame.

name: str instance-attribute #

The name of the model.

num_classes: int instance-attribute #

The number of classes that the model can predict.

samplerate: int instance-attribute #

The sample rate of the audio data expected by the model (in Hz).

tags: List[data.Tag] instance-attribute #

The list of tags that the model can predict.

process_array(array) abstractmethod #

Process a single audio array and return the model output.


Name Type Description Default
array ndarray

The audio array to be processed, with shape (num_frames, input_samples).



Type Description

A ModelOutput object containing the class probabilities and extracted features.

process_clip(clip, fmt='soundevent') #

Process an audio clip and return the model output.


Name Type Description Default
clip Clip

The Clip object representing the audio segment.

fmt Literal['soundevent', 'dataset']

The desired output format. "soundevent" returns a list of ClipPrediction objects, while "dataset" returns an xarray Dataset. Defaults to "soundevent".



Type Description
Union[List[ClipPrediction], Dataset]

The model output in the specified format.

process_file(path, fmt='soundevent', **kwargs) #

Process an audio file and return the model output.


Name Type Description Default
path Path

The path to the audio file.

fmt Literal['soundevent', 'dataset']

The desired output format. "soundevent" returns a list of ClipPrediction objects, while "dataset" returns an xarray Dataset. Defaults to "soundevent".


Additional keyword arguments to pass to Recording.from_file().



Type Description
Union[List[ClipPrediction], Dataset]

The model output in the specified format.

process_iterable(iterable) #

Process an iterable of audio clips and return a list of predictions.


Name Type Description Default
iterable BaseIterator

An iterator that yields Clip objects.



Type Description

A list of ClipPrediction objects, one for each clip in the iterable.

process_recording(recording, fmt='soundevent') #

Process an audio recording and return the model output.


Name Type Description Default
recording Recording

The Recording object representing the audio.

fmt Literal['soundevent', 'dataset']

The desired output format. "soundevent" returns a list of ClipPrediction objects, while "dataset" returns an xarray Dataset. Defaults to "soundevent".



Type Description
Union[List[ClipPrediction], Dataset]

The model output in the specified format.

ModelOutput #

Bases: NamedTuple

Output format for audio classification models.

class_probs: np.ndarray instance-attribute #

Array of class probabilities for each frame.

The array has shape (num_frames, num_classes), where num_frames is the number of frames in the input audio clip and num_classes is the number of classes that the model can predict.

Notice that the interpretation may vary depending on the model and it is advisable to check the model's documentation for more information.

features: np.ndarray instance-attribute #

Array of extracted features for each frame.

The array has shape (num_frames, num_features), where num_frames is the number of frames in the input audio clip and num_features is the number of features extracted by the model.

The features can be used for further analysis or visualization of the model output.
