Evaluation#
The soundevent.evaluation module provides a comprehensive suite of tools for evaluating sound event analysis systems.
It covers various tasks ranging from simple clip-level classification to detailed sound event detection.
Additional dependencies
To use the soundevent.evaluation module you need to install some
additional dependencies. Make sure you have them installed by running the
following command:
Evaluation Tasks#
The module supports several standard evaluation tasks. Each task corresponds to a specific way of framing the bioacoustic problem and requires different inputs and metrics.
- Sound Event Detection (SED): Evaluating systems that detect and classify sound events corresponding to a Region of Interest in the time-frequency domain.
- Clip Classification: Evaluating systems that assign a single label to an entire audio clip.
- Clip Multilabel Classification: Evaluating systems that can assign multiple labels to an audio clip (e.g., multiple species present).
- Sound Event Classification: Evaluating systems that classify pre-segmented sound events.
Core Components#
The evaluation module is built upon several core components that handle matching, scoring, and encoding.
Matching#
Algorithms for matching predictions to ground truth annotations.
soundevent.evaluation.match
#
Algorithms for matching predictions to ground truths.
Classes:
| Name | Description |
|---|---|
Match |
Result of matching a single prediction to a ground truth annotation. |
Functions:
| Name | Description |
|---|---|
match_detections_and_gts |
Match predictions to ground truths greedily based on confidence scores. |
match_geometries |
Match geometries between a source and a target sequence. |
Attributes#
Detection = TypeVar('Detection')
module-attribute
#
GroundTruth = TypeVar('GroundTruth')
module-attribute
#
Classes#
Match(prediction, annotation, affinity_score, prediction_score)
dataclass
#
Bases: Generic[Detection, GroundTruth]
Result of matching a single prediction to a ground truth annotation.
This tuple covers three possible scenarios:
- True Positive (TP): Both
predictionandannotationare present. Theaffinity_scoreindicates the quality of the match. - False Positive (FP):
predictionis present, butannotationis None. This occurs when a prediction had no valid overlap with any ground truth (or its best matches were taken by higher-scoring predictions). - False Negative (FN):
annotationis present, butpredictionis None. This occurs when a ground truth object was not matched by any prediction.
Attributes:
| Name | Type | Description |
|---|---|---|
prediction |
Detection | None
|
The predicted object. If None, this represents a missed ground truth (FN). |
annotation |
GroundTruth | None
|
The target object. If None, this represents a false alarm (FP). |
affinity_score |
float
|
The affinity (e.g., IoU) between the prediction and the annotation. Returns 0.0 if either the prediction or annotation is None. |
prediction_score |
float
|
The confidence score of the prediction. Returns 0.0 if the prediction is None. |
Functions#
match_detections_and_gts(detections, ground_truths, affinity, affinity_threshold=0, score=None, strict_match=False)
#
Match predictions to ground truths greedily based on confidence scores.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
detections
|
Sequence[Detection]
|
A sequence of prediction objects. |
required |
ground_truths
|
Sequence[GroundTruth]
|
A sequence of target objects to match against. |
required |
affinity
|
ndarray | Callable[[Detection, GroundTruth], float]
|
Either a precomputed (N_det, N_gt) affinity matrix or a function to compute the affinity score between a detection and a ground truth. |
required |
affinity_threshold
|
float
|
Matches with affinity <= threshold are discarded. Defaults to 0.0. |
0
|
score
|
ndarray | Callable[[Detection], float] | Sequence[float] | None
|
The confidence scores used to prioritise detections during greedy matching. Higher scores are processed first. This parameter accepts:
|
None
|
strict_match
|
bool
|
If True, prevents fallback to the second-best ground truth if the best one is taken. |
False
|
Yields:
| Type | Description |
|---|---|
EvaluationMatch
|
A named tuple containing the matching results.
The iterator yields a "Full Outer Join" of the inputs: all predictions and all ground truths will appear exactly once in the output stream, either paired together or paired with None. |
match_geometries(source, target, time_buffer=0.01, freq_buffer=100, affinity_threshold=0)
#
Match geometries between a source and a target sequence.
The matching is performed by first computing an affinity matrix between
all pairs of source and target geometries. The affinity is a measure of
similarity, calculated as the Intersection over Union (IoU). For more
details on how affinity is computed, see
soundevent.evaluation.affinity.compute_affinity.
The affinity calculation is influenced by the time_buffer and
freq_buffer parameters, which add a buffer to each geometry before
comparison. This can help account for small variations in annotations.
Once the affinity matrix is computed, the Hungarian algorithm (via
scipy.optimize.linear_sum_assignment) is used to find an optimal
assignment of source to target geometries that maximizes the total
affinity.
Finally, matches with an affinity below affinity_threshold are
discarded and considered as unmatched.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
Sequence[Geometry]
|
The source geometries to match. |
required |
target
|
Sequence[Geometry]
|
The target geometries to match. |
required |
time_buffer
|
float
|
A buffer in seconds added to each geometry when computing affinity.
See
|
0.01
|
freq_buffer
|
float
|
A buffer in Hertz added to each geometry when computing affinity.
See
|
100
|
affinity_threshold
|
float
|
The minimum affinity (IoU) for a pair of geometries to be considered a match. Pairs with affinity below this value are considered unmatched, by default 0. |
0
|
Returns:
| Type | Description |
|---|---|
Iterable[Tuple[Optional[int], Optional[int], float]]
|
An iterable of matching results. Each source and target geometry is accounted for exactly once in the output. Each tuple can be one of:
|
Affinity#
Functions to compute the similarity (affinity) between geometries (e.g., IoU, temporal distance).
soundevent.evaluation.affinity
#
Measures of affinity between sound events geometries.
Functions:
| Name | Description |
|---|---|
compute_affinity |
Compute the geometric affinity between two geometries. |
Attributes#
TIME_GEOMETRY_TYPES = {data.TimeStamp.geom_type(), data.TimeInterval.geom_type()}
module-attribute
#
Functions#
compute_affinity(geometry1, geometry2, time_buffer=0.01, freq_buffer=100)
#
Compute the geometric affinity between two geometries.
This function calculates the geometric similarity between two geometries, which is a measure of how much they overlap. The affinity is computed as the Intersection over Union (IoU).
Intersection over Union (IoU)
IoU is a standard metric for comparing the similarity between two shapes. It is calculated as the ratio of the area of the overlap between the two geometries to the area of their combined shape.
.. math::
\text{IoU} = \frac{\text{Area of Overlap}}{\text{Area of Union}}
An IoU of 1 means the geometries are identical, while an IoU of 0 means they do not overlap at all. This is particularly useful in bioacoustics for comparing annotations or predictions of sound events in a time-frequency representation (spectrogram).
To account for small variations in annotations, a buffer can be added to
each geometry before computing the IoU. This is controlled by the
time_buffer and freq_buffer parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
geometry1
|
Geometry
|
The first geometry to be compared. |
required |
geometry2
|
Geometry
|
The second geometry to be compared. |
required |
time_buffer
|
float
|
Time buffer in seconds added to each geometry. Default is 0.01. |
0.01
|
freq_buffer
|
float
|
Frequency buffer in Hertz added to each geometry. Default is 100. |
100
|
Returns:
| Name | Type | Description |
|---|---|---|
affinity |
float
|
The Intersection over Union (IoU) score, a value between 0 and 1 indicating the degree of overlap. |
Notes
- If either input geometry is of a time-based type, a specialized time-based affinity calculation is performed.
Examples:
compute_affinity_in_time(geometry1, geometry2)
#
Compute the temporal affinity between two geometries.
Encoding#
Utilities for encoding tags and predictions into numerical formats for metric computation.
soundevent.evaluation.encoding
#
Tag Encoder Module.
Classes:
| Name | Description |
|---|---|
Encoder |
A callable object that maps tags into integers. |
Functions:
| Name | Description |
|---|---|
classification_encoding |
Encode a list of tags into an integer value. |
create_tag_encoder |
Create an encoder object from a list of tags. |
multilabel_encoding |
Encode a list of tags into a binary multilabel array. |
prediction_encoding |
Encode a list of predicted tags into a floating-point array of scores. |
Classes#
Encoder
#
Bases: Protocol
A callable object that maps tags into integers.
This protocol defines the structure of an encoder function, which maps
tags into integers. The encoder function takes a data.Tag object as input
and returns an optional integer encoding. If the encoder returns None for
a tag, it will be skipped.
Attributes:
| Name | Type | Description |
|---|---|---|
num_classes |
int
|
The total number of classes for the encoding. |
Methods:
| Name | Description |
|---|---|
decode |
Decode an integer into a tag. |
encode |
Encode a tag into an integer. |
SimpleEncoder(tags)
#
Bases: Encoder
A basic implementation of the Encoder protocol.
This class provides a simple implementation of the Encoder protocol. It encodes tags into integers and decodes integers back into tags based on a provided list of tags.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tags
|
Sequence[Tag]
|
A list of tags to be encoded. |
required |
Methods:
| Name | Description |
|---|---|
decode |
|
encode |
|
Attributes:
| Name | Type | Description |
|---|---|---|
num_classes |
|
Functions#
classification_encoding(tags, encoder)
#
Encode a list of tags into an integer value.
This function is commonly used for mapping a list of tags to a compact integer representation, typically representing classes associated with objects like clips or sound events.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tags
|
Sequence[Tag]
|
A list of tags to be encoded. |
required |
encoder
|
Callable[[Tag], Optional[int]]
|
A callable object that takes a data.Tag object as input and returns an optional integer encoding. If the encoder returns None for a tag, it will be skipped. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
encoded |
Optional[int]
|
The encoded integer value representing the tags, or None if no encoding is available. |
Examples:
Consider the following set of tags:
>>> dog = data.Tag(key="animal", value="dog")
>>> cat = data.Tag(key="animal", value="cat")
>>> brown = data.Tag(key="color", value="brown")
>>> blue = data.Tag(key="color", value="blue")
If we are interested in encoding only the 'dog' and 'brown' classes, the following examples demonstrate how the encoding works for tag list:
create_tag_encoder(tags)
#
Create an encoder object from a list of tags.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tags
|
Sequence[Tag]
|
A list of tags to be encoded. |
required |
Returns:
| Type | Description |
|---|---|
SimpleEncoder
|
An instance of SimpleEncoder initialized with the provided tags. |
multilabel_encoding(tags, encoder)
#
Encode a list of tags into a binary multilabel array.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tags
|
Sequence[Tag]
|
A list of tags to be encoded. |
required |
encoder
|
Encoder
|
A callable object that takes a data.Tag object as input and returns an optional integer encoding. If the encoder returns None for a tag, it will be skipped. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
encoded |
ndarray
|
A binary numpy array of shape (num_classes,) representing the multilabel encoding for the input tags. Each index with a corresponding tag is set to 1, and the rest are 0. |
Examples:
Consider the following set of tags:
>>> dog = data.Tag(key="animal", value="dog")
>>> cat = data.Tag(key="animal", value="cat")
>>> brown = data.Tag(key="color", value="brown")
>>> blue = data.Tag(key="color", value="blue")
And we are only interested in encoding the following two classes:
Then the following examples show how the multilabel encoding works:
prediction_encoding(tags, encoder)
#
Encode a list of predicted tags into a floating-point array of scores.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tags
|
Sequence[PredictedTag]
|
A list of predicted tags to be encoded. |
required |
encoder
|
Encoder
|
A callable object that takes a data.Tag object as input and returns an optional integer encoding. If the encoder returns None for a tag, it will be skipped. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
encoded |
ndarray
|
A numpy array of floats of shape (num_classes,) representing the predicted scores for each class. The array contains the scores for each class corresponding to the input predicted tags. |
Examples:
Consider the following set of tags:
>>> dog = data.Tag(key="animal", value="dog")
>>> cat = data.Tag(key="animal", value="cat")
>>> brown = data.Tag(key="color", value="brown")
>>> blue = data.Tag(key="color", value="blue")
And we are only interested in encoding the following two classes:
Then the following examples show how the encoding works for predicted tags:
>>> prediction_encoding([data.PredictedTag(tag=brown, score=0.5)], encoder)
array([0. , 0.5]...)
>>> prediction_encoding(
... [
... data.PredictedTag(tag=dog, score=0.2),
... data.PredictedTag(tag=blue, score=0.9),
... ],
... encoder,
... )
array([0.2, 0. ]...)
>>> prediction_encoding(
... [
... data.PredictedTag(tag=dog, score=0.2),
... data.PredictedTag(tag=brown, score=0.5),
... ],
... encoder,
... )
array([0.2, 0.5]...)
>>> prediction_encoding(
... [
... data.PredictedTag(tag=cat, score=0.7),
... ],
... encoder,
... )
array([0., 0.]...)