base#
audioclass.models.base
#
Module defining the base classes for audio classification models and their output format.
This module provides abstract classes for clip classification models, establishing a standard interface for model input, processing, and output. It also defines the structure of the model output, which includes class probabilities and extracted features.
Classes:
Name | Description |
---|---|
ClipClassificationModel |
Abstract base class for audio clip classification models. |
ModelOutput |
Output format for audio classification models. |
Classes#
ClipClassificationModel
#
Bases: ABC
Abstract base class for audio clip classification models.
This class defines the common interface for audio classification models that process individual clips. It provides methods for processing raw audio arrays, files, recordings, and clips, as well as an iterable of clips.
Methods:
Name | Description |
---|---|
process_array |
Process a single audio array and return the model output. |
process_clip |
Process an audio clip and return the model output. |
process_file |
Process an audio file and return the model output. |
process_iterable |
Process an iterable of audio clips and return a list of predictions. |
process_recording |
Process an audio recording and return the model output. |
Attributes:
Name | Type | Description |
---|---|---|
batch_size |
int
|
The maximum number of framces to process in each batch. |
confidence_threshold |
float
|
The minimum confidence threshold for a class to be considered. |
input_samples |
int
|
The number of audio samples expected in each input frame. |
name |
str
|
The name of the model. |
num_classes |
int
|
The number of classes that the model can predict. |
samplerate |
int
|
The sample rate of the audio data expected by the model (in Hz). |
tags |
List[Tag]
|
The list of tags that the model can predict. |
Attributes#
batch_size: int = 8
class-attribute
instance-attribute
#
The maximum number of framces to process in each batch.
confidence_threshold: float
instance-attribute
#
The minimum confidence threshold for a class to be considered.
input_samples: int
instance-attribute
#
The number of audio samples expected in each input frame.
name: str
instance-attribute
#
The name of the model.
num_classes: int
instance-attribute
#
The number of classes that the model can predict.
samplerate: int
instance-attribute
#
The sample rate of the audio data expected by the model (in Hz).
tags: List[data.Tag]
instance-attribute
#
The list of tags that the model can predict.
Functions#
process_array(array)
abstractmethod
#
Process a single audio array and return the model output.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
array
|
ndarray
|
The audio array to be processed, with shape
|
required |
Returns:
Type | Description |
---|---|
ModelOutput
|
A |
process_clip(clip, fmt='soundevent')
#
Process an audio clip and return the model output.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
clip
|
Clip
|
The |
required |
fmt
|
Literal['soundevent', 'dataset']
|
The desired output format. "soundevent" returns a list of
|
'soundevent'
|
Returns:
Type | Description |
---|---|
Union[List[ClipPrediction], Dataset]
|
The model output in the specified format. |
process_file(path, fmt='soundevent', **kwargs)
#
Process an audio file and return the model output.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Path
|
The path to the audio file. |
required |
fmt
|
Literal['soundevent', 'dataset']
|
The desired output format. "soundevent" returns a list of
|
'soundevent'
|
**kwargs
|
Additional keyword arguments to pass to |
{}
|
Returns:
Type | Description |
---|---|
Union[List[ClipPrediction], Dataset]
|
The model output in the specified format. |
process_iterable(iterable)
#
Process an iterable of audio clips and return a list of predictions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
iterable
|
BaseIterator
|
An iterator that yields |
required |
Returns:
Type | Description |
---|---|
List[ClipPrediction]
|
A list of |
process_recording(recording, fmt='soundevent')
#
Process an audio recording and return the model output.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
recording
|
Recording
|
The |
required |
fmt
|
Literal['soundevent', 'dataset']
|
The desired output format. "soundevent" returns a list of
|
'soundevent'
|
Returns:
Type | Description |
---|---|
Union[List[ClipPrediction], Dataset]
|
The model output in the specified format. |
ModelOutput
#
Bases: NamedTuple
Output format for audio classification models.
Attributes:
Name | Type | Description |
---|---|---|
class_probs |
ndarray
|
Array of class probabilities for each frame. |
features |
ndarray
|
Array of extracted features for each frame. |
Attributes#
class_probs: np.ndarray
instance-attribute
#
Array of class probabilities for each frame.
The array has shape (num_frames, num_classes)
, where num_frames
is the
number of frames in the input audio clip and num_classes
is the number of
classes that the model can predict.
Notice that the interpretation may vary depending on the model and it is advisable to check the model's documentation for more information.
features: np.ndarray
instance-attribute
#
Array of extracted features for each frame.
The array has shape (num_frames, num_features)
, where num_frames
is the
number of frames in the input audio clip and num_features
is the number
of features extracted by the model.
The features can be used for further analysis or visualization of the model output.