base#

`audioclass.models.base` #

Module defining the base classes for audio classification models and their output format.

This module provides abstract classes for clip classification models, establishing a standard interface for model input, processing, and output. It also defines the structure of the model output, which includes class probabilities and extracted features.

Classes:

Name	Description
`ClipClassificationModel`	Abstract base class for audio clip classification models.
`ModelOutput`	Output format for audio classification models.

Classes#

`ClipClassificationModel` #

Bases: ABC

Abstract base class for audio clip classification models.

This class defines the common interface for audio classification models that process individual clips. It provides methods for processing raw audio arrays, files, recordings, and clips, as well as an iterable of clips.

Methods:

Name	Description
`process_array`	Process a single audio array and return the model output.
`process_clip`	Process an audio clip and return the model output.
`process_file`	Process an audio file and return the model output.
`process_iterable`	Process an iterable of audio clips and return a list of predictions.
`process_recording`	Process an audio recording and return the model output.

Attributes:

Name	Type	Description
`batch_size`	`int`	The maximum number of framces to process in each batch.
`confidence_threshold`	`float`	The minimum confidence threshold for a class to be considered.
`input_samples`	`int`	The number of audio samples expected in each input frame.
`name`	`str`	The name of the model.
`num_classes`	`int`	The number of classes that the model can predict.
`samplerate`	`int`	The sample rate of the audio data expected by the model (in Hz).
`tags`	`List[Tag]`	The list of tags that the model can predict.

Attributes#

`batch_size: int = 8` `class-attribute` `instance-attribute` #

The maximum number of framces to process in each batch.

`confidence_threshold: float` `instance-attribute` #

The minimum confidence threshold for a class to be considered.

`input_samples: int` `instance-attribute` #

The number of audio samples expected in each input frame.

`name: str` `instance-attribute` #

The name of the model.

`num_classes: int` `instance-attribute` #

The number of classes that the model can predict.

`samplerate: int` `instance-attribute` #

The sample rate of the audio data expected by the model (in Hz).

`tags: List[data.Tag]` `instance-attribute` #

The list of tags that the model can predict.

Functions#

`process_array(array)` `abstractmethod` #

Process a single audio array and return the model output.

Parameters:

Name	Type	Description	Default
`array`	`ndarray`	The audio array to be processed, with shape `(num_frames, input_samples)`.	required

Returns:

Type	Description
`ModelOutput`	A `ModelOutput` object containing the class probabilities and extracted features.

`process_clip(clip, fmt='soundevent')` #

process_clip(clip: data.Clip, fmt: Literal['soundevent'] = 'soundevent') -> List[data.ClipPrediction]

process_clip(clip: data.Clip, fmt: Literal['dataset'] = 'dataset') -> xr.Dataset

Process an audio clip and return the model output.

Parameters:

Name	Type	Description	Default
`clip`	`Clip`	The `Clip` object representing the audio segment.	required
`fmt`	`Literal['soundevent', 'dataset']`	The desired output format. "soundevent" returns a list of `ClipPrediction` objects, while "dataset" returns an xarray `Dataset`. Defaults to "soundevent".	`'soundevent'`

Returns:

Type	Description
`Union[List[ClipPrediction], Dataset]`	The model output in the specified format.

`process_file(path, fmt='soundevent', **kwargs)` #

process_file(path: Path, fmt: Literal['soundevent'] = 'soundevent', **kwargs) -> List[data.ClipPrediction]

process_file(path: Path, fmt: Literal['dataset'] = 'dataset', **kwargs) -> xr.Dataset

Process an audio file and return the model output.

Parameters:

Name	Type	Description	Default
`path`	`Path`	The path to the audio file.	required
`fmt`	`Literal['soundevent', 'dataset']`	The desired output format. "soundevent" returns a list of `ClipPrediction` objects, while "dataset" returns an xarray `Dataset`. Defaults to "soundevent".	`'soundevent'`
`**kwargs`		Additional keyword arguments to pass to `Recording.from_file()`.	`{}`

Returns:

Type	Description
`Union[List[ClipPrediction], Dataset]`	The model output in the specified format.

`process_iterable(iterable)` #

Process an iterable of audio clips and return a list of predictions.

Parameters:

Name	Type	Description	Default
`iterable`	`BaseIterator`	An iterator that yields `Clip` objects.	required

Returns:

Type	Description
`List[ClipPrediction]`	A list of `ClipPrediction` objects, one for each clip in the iterable.

`process_recording(recording, fmt='soundevent')` #

process_recording(recording: data.Recording, fmt: Literal['soundevent'] = 'soundevent') -> List[data.ClipPrediction]

process_recording(recording: data.Recording, fmt: Literal['dataset'] = 'dataset') -> xr.Dataset

Process an audio recording and return the model output.

Parameters:

Name	Type	Description	Default
`recording`	`Recording`	The `Recording` object representing the audio.	required
`fmt`	`Literal['soundevent', 'dataset']`	The desired output format. "soundevent" returns a list of `ClipPrediction` objects, while "dataset" returns an xarray `Dataset`. Defaults to "soundevent".	`'soundevent'`

Returns:

Type	Description
`Union[List[ClipPrediction], Dataset]`	The model output in the specified format.

`ModelOutput` #

Bases: NamedTuple

Output format for audio classification models.

Attributes:

Name	Type	Description
`class_probs`	`ndarray`	Array of class probabilities for each frame.
`features`	`ndarray`	Array of extracted features for each frame.

Attributes#

`class_probs: np.ndarray` `instance-attribute` #

Array of class probabilities for each frame.

The array has shape (num_frames, num_classes), where num_frames is the number of frames in the input audio clip and num_classes is the number of classes that the model can predict.

Notice that the interpretation may vary depending on the model and it is advisable to check the model's documentation for more information.

`features: np.ndarray` `instance-attribute` #

Array of extracted features for each frame.

The array has shape (num_frames, num_features), where num_frames is the number of frames in the input audio clip and num_features is the number of features extracted by the model.

The features can be used for further analysis or visualization of the model output.

base#

audioclass.models.base #

Classes#

ClipClassificationModel #

Attributes#

batch_size: int = 8 class-attribute instance-attribute #

confidence_threshold: float instance-attribute #

input_samples: int instance-attribute #

name: str instance-attribute #

num_classes: int instance-attribute #

samplerate: int instance-attribute #

tags: List[data.Tag] instance-attribute #

Functions#

process_array(array) abstractmethod #

process_clip(clip, fmt='soundevent') #

process_file(path, fmt='soundevent', **kwargs) #

process_iterable(iterable) #

process_recording(recording, fmt='soundevent') #

ModelOutput #

Attributes#

class_probs: np.ndarray instance-attribute #

features: np.ndarray instance-attribute #

Functions#

`audioclass.models.base` #

`ClipClassificationModel` #

`batch_size: int = 8` `class-attribute` `instance-attribute` #

`confidence_threshold: float` `instance-attribute` #

`input_samples: int` `instance-attribute` #

`name: str` `instance-attribute` #

`num_classes: int` `instance-attribute` #

`samplerate: int` `instance-attribute` #

`tags: List[data.Tag]` `instance-attribute` #

`process_array(array)` `abstractmethod` #

`process_clip(clip, fmt='soundevent')` #

`process_file(path, fmt='soundevent', **kwargs)` #

`process_iterable(iterable)` #

`process_recording(recording, fmt='soundevent')` #

`ModelOutput` #

`class_probs: np.ndarray` `instance-attribute` #

`features: np.ndarray` `instance-attribute` #