Skip to content

Evaluation#

The soundevent.evaluation module provides a comprehensive suite of tools for evaluating sound event analysis systems. It covers various tasks ranging from simple clip-level classification to detailed sound event detection.

Additional dependencies

To use the soundevent.evaluation module you need to install some additional dependencies. Make sure you have them installed by running the following command:

pip install soundevent[evaluation]

Evaluation Tasks#

The module supports several standard evaluation tasks. Each task corresponds to a specific way of framing the bioacoustic problem and requires different inputs and metrics.

Core Components#

The evaluation module is built upon several core components that handle matching, scoring, and encoding.

Matching#

Algorithms for matching predictions to ground truth annotations.

soundevent.evaluation.match #

Algorithms for matching predictions to ground truths.

Classes:

Name Description
Match

Result of matching a single prediction to a ground truth annotation.

Functions:

Name Description
match_detections_and_gts

Match predictions to ground truths greedily based on confidence scores.

match_geometries

Match geometries between a source and a target sequence.

Attributes#

Detection = TypeVar('Detection') module-attribute #

GroundTruth = TypeVar('GroundTruth') module-attribute #

Classes#

Match(prediction, annotation, affinity_score, prediction_score) dataclass #

Bases: Generic[Detection, GroundTruth]

Result of matching a single prediction to a ground truth annotation.

This tuple covers three possible scenarios:

  1. True Positive (TP): Both prediction and annotation are present. The affinity_score indicates the quality of the match.
  2. False Positive (FP): prediction is present, but annotation is None. This occurs when a prediction had no valid overlap with any ground truth (or its best matches were taken by higher-scoring predictions).
  3. False Negative (FN): annotation is present, but prediction is None. This occurs when a ground truth object was not matched by any prediction.

Attributes:

Name Type Description
prediction Detection | None

The predicted object. If None, this represents a missed ground truth (FN).

annotation GroundTruth | None

The target object. If None, this represents a false alarm (FP).

affinity_score float

The affinity (e.g., IoU) between the prediction and the annotation. Returns 0.0 if either the prediction or annotation is None.

prediction_score float

The confidence score of the prediction. Returns 0.0 if the prediction is None.

Attributes#
affinity_score instance-attribute #
annotation instance-attribute #
prediction instance-attribute #
prediction_score instance-attribute #

Functions#

match_detections_and_gts(detections, ground_truths, affinity, affinity_threshold=0, score=None, strict_match=False) #

Match predictions to ground truths greedily based on confidence scores.

Parameters:

Name Type Description Default
detections Sequence[Detection]

A sequence of prediction objects.

required
ground_truths Sequence[GroundTruth]

A sequence of target objects to match against.

required
affinity ndarray | Callable[[Detection, GroundTruth], float]

Either a precomputed (N_det, N_gt) affinity matrix or a function to compute the affinity score between a detection and a ground truth.

required
affinity_threshold float

Matches with affinity <= threshold are discarded. Defaults to 0.0.

0
score ndarray | Callable[[Detection], float] | Sequence[float] | None

The confidence scores used to prioritise detections during greedy matching. Higher scores are processed first. This parameter accepts:

  • A sequence or array of scores corresponding to detections (must match the length and order of the input list).
  • A callable that extracts a float score from a single prediction object.
  • None: All predictions are assigned a score of 1.0, preserving the original input order.
None
strict_match bool

If True, prevents fallback to the second-best ground truth if the best one is taken.

False

Yields:

Type Description
EvaluationMatch

A named tuple containing the matching results.

  • prediction: The Detection object (or None for false negatives).
  • annotation: The GroundTruth object (or None for false positives).
  • affinity_score: The score of the match (0.0 if unmatched).
  • prediction_score: The prediction confidence score (0.0 if unmatched).

The iterator yields a "Full Outer Join" of the inputs: all predictions and all ground truths will appear exactly once in the output stream, either paired together or paired with None.

match_geometries(source, target, time_buffer=0.01, freq_buffer=100, affinity_threshold=0) #

Match geometries between a source and a target sequence.

The matching is performed by first computing an affinity matrix between all pairs of source and target geometries. The affinity is a measure of similarity, calculated as the Intersection over Union (IoU). For more details on how affinity is computed, see soundevent.evaluation.affinity.compute_affinity.

The affinity calculation is influenced by the time_buffer and freq_buffer parameters, which add a buffer to each geometry before comparison. This can help account for small variations in annotations.

Once the affinity matrix is computed, the Hungarian algorithm (via scipy.optimize.linear_sum_assignment) is used to find an optimal assignment of source to target geometries that maximizes the total affinity.

Finally, matches with an affinity below affinity_threshold are discarded and considered as unmatched.

Parameters:

Name Type Description Default
source Sequence[Geometry]

The source geometries to match.

required
target Sequence[Geometry]

The target geometries to match.

required
time_buffer float

A buffer in seconds added to each geometry when computing affinity. See soundevent.evaluation.affinity.compute_affinity for more details. Defaults to 0.01.

0.01
freq_buffer float

A buffer in Hertz added to each geometry when computing affinity. See soundevent.evaluation.affinity.compute_affinity for more details. Defaults to 100.

100
affinity_threshold float

The minimum affinity (IoU) for a pair of geometries to be considered a match. Pairs with affinity below this value are considered unmatched, by default 0.

0

Returns:

Type Description
Iterable[Tuple[Optional[int], Optional[int], float]]

An iterable of matching results. Each source and target geometry is accounted for exactly once in the output. Each tuple can be one of:

  • (source_index, target_index, affinity): A successful match between a source and a target geometry with an affinity score.
  • (source_index, None, 0): An unmatched source geometry.
  • (None, target_index, 0): An unmatched target geometry.

Affinity#

Functions to compute the similarity (affinity) between geometries (e.g., IoU, temporal distance).

soundevent.evaluation.affinity #

Measures of affinity between sound events geometries.

Functions:

Name Description
compute_affinity

Compute the geometric affinity between two geometries.

Attributes#

TIME_GEOMETRY_TYPES = {data.TimeStamp.geom_type(), data.TimeInterval.geom_type()} module-attribute #

Functions#

compute_affinity(geometry1, geometry2, time_buffer=0.01, freq_buffer=100) #

Compute the geometric affinity between two geometries.

This function calculates the geometric similarity between two geometries, which is a measure of how much they overlap. The affinity is computed as the Intersection over Union (IoU).

Intersection over Union (IoU)

IoU is a standard metric for comparing the similarity between two shapes. It is calculated as the ratio of the area of the overlap between the two geometries to the area of their combined shape.

.. math::

\text{IoU} = \frac{\text{Area of Overlap}}{\text{Area of Union}}

An IoU of 1 means the geometries are identical, while an IoU of 0 means they do not overlap at all. This is particularly useful in bioacoustics for comparing annotations or predictions of sound events in a time-frequency representation (spectrogram).

To account for small variations in annotations, a buffer can be added to each geometry before computing the IoU. This is controlled by the time_buffer and freq_buffer parameters.

Parameters:

Name Type Description Default
geometry1 Geometry

The first geometry to be compared.

required
geometry2 Geometry

The second geometry to be compared.

required
time_buffer float

Time buffer in seconds added to each geometry. Default is 0.01.

0.01
freq_buffer float

Frequency buffer in Hertz added to each geometry. Default is 100.

100

Returns:

Name Type Description
affinity float

The Intersection over Union (IoU) score, a value between 0 and 1 indicating the degree of overlap.

Notes
  • If either input geometry is of a time-based type, a specialized time-based affinity calculation is performed.

Examples:

>>> geometry1 = data.BoundingBox(coordinates=[0.4, 2000, 0.6, 8000])
>>> geometry2 = data.BoundingBox(coordinates=[0.5, 5000, 0.7, 6000])
>>> affinity = compute_affinity(
...     geometry1,
...     geometry2,
...     time_buffer=0.02,
...     freq_buffer=150,
... )
>>> print(round(affinity, 3))
0.111

compute_affinity_in_time(geometry1, geometry2) #

Compute the temporal affinity between two geometries.

Encoding#

Utilities for encoding tags and predictions into numerical formats for metric computation.

soundevent.evaluation.encoding #

Tag Encoder Module.

Classes:

Name Description
Encoder

A callable object that maps tags into integers.

Functions:

Name Description
classification_encoding

Encode a list of tags into an integer value.

create_tag_encoder

Create an encoder object from a list of tags.

multilabel_encoding

Encode a list of tags into a binary multilabel array.

prediction_encoding

Encode a list of predicted tags into a floating-point array of scores.

Classes#

Encoder #

Bases: Protocol

A callable object that maps tags into integers.

This protocol defines the structure of an encoder function, which maps tags into integers. The encoder function takes a data.Tag object as input and returns an optional integer encoding. If the encoder returns None for a tag, it will be skipped.

Attributes:

Name Type Description
num_classes int

The total number of classes for the encoding.

Methods:

Name Description
decode

Decode an integer into a tag.

encode

Encode a tag into an integer.

Attributes#
num_classes instance-attribute #
Functions#
decode(index) #

Decode an integer into a tag.

Parameters:

Name Type Description Default
index int

The integer index to be decoded.

required

Returns:

Type Description
Tag

The decoded tag.

encode(tag) #

Encode a tag into an integer.

Parameters:

Name Type Description Default
tags Tag

The tag to be encoded.

required

Returns:

Type Description
Optional[int]

The encoded integer value representing the tag, or None if no encoding is available.

SimpleEncoder(tags) #

Bases: Encoder

A basic implementation of the Encoder protocol.

This class provides a simple implementation of the Encoder protocol. It encodes tags into integers and decodes integers back into tags based on a provided list of tags.

Parameters:

Name Type Description Default
tags Sequence[Tag]

A list of tags to be encoded.

required

Methods:

Name Description
decode
encode

Attributes:

Name Type Description
num_classes
Attributes#
num_classes = len(tags) instance-attribute #
Functions#
decode(index) #
encode(tag) #

Functions#

classification_encoding(tags, encoder) #

Encode a list of tags into an integer value.

This function is commonly used for mapping a list of tags to a compact integer representation, typically representing classes associated with objects like clips or sound events.

Parameters:

Name Type Description Default
tags Sequence[Tag]

A list of tags to be encoded.

required
encoder Callable[[Tag], Optional[int]]

A callable object that takes a data.Tag object as input and returns an optional integer encoding. If the encoder returns None for a tag, it will be skipped.

required

Returns:

Name Type Description
encoded Optional[int]

The encoded integer value representing the tags, or None if no encoding is available.

Examples:

Consider the following set of tags:

>>> dog = data.Tag(key="animal", value="dog")
>>> cat = data.Tag(key="animal", value="cat")
>>> brown = data.Tag(key="color", value="brown")
>>> blue = data.Tag(key="color", value="blue")

If we are interested in encoding only the 'dog' and 'brown' classes, the following examples demonstrate how the encoding works for tag list:

>>> encoder = create_tag_encoder([dog, brown])
>>> classification_encoding([brown], encoder)
1
>>> classification_encoding([dog, blue], encoder)
0
>>> classification_encoding([dog, brown], encoder)
0
>>> classification_encoding([cat], encoder)

create_tag_encoder(tags) #

Create an encoder object from a list of tags.

Parameters:

Name Type Description Default
tags Sequence[Tag]

A list of tags to be encoded.

required

Returns:

Type Description
SimpleEncoder

An instance of SimpleEncoder initialized with the provided tags.

multilabel_encoding(tags, encoder) #

Encode a list of tags into a binary multilabel array.

Parameters:

Name Type Description Default
tags Sequence[Tag]

A list of tags to be encoded.

required
encoder Encoder

A callable object that takes a data.Tag object as input and returns an optional integer encoding. If the encoder returns None for a tag, it will be skipped.

required

Returns:

Name Type Description
encoded ndarray

A binary numpy array of shape (num_classes,) representing the multilabel encoding for the input tags. Each index with a corresponding tag is set to 1, and the rest are 0.

Examples:

Consider the following set of tags:

>>> dog = data.Tag(key="animal", value="dog")
>>> cat = data.Tag(key="animal", value="cat")
>>> brown = data.Tag(key="color", value="brown")
>>> blue = data.Tag(key="color", value="blue")

And we are only interested in encoding the following two classes:

>>> encoder = create_tag_encoder([dog, brown])

Then the following examples show how the multilabel encoding works:

>>> multilabel_encoding([brown], encoder)
array([0, 1]...)
>>> multilabel_encoding([dog, blue], encoder)
array([1, 0]...)
>>> multilabel_encoding([dog, brown], encoder)
array([1, 1]...)
>>> multilabel_encoding([cat], encoder)
array([0, 0]...)

prediction_encoding(tags, encoder) #

Encode a list of predicted tags into a floating-point array of scores.

Parameters:

Name Type Description Default
tags Sequence[PredictedTag]

A list of predicted tags to be encoded.

required
encoder Encoder

A callable object that takes a data.Tag object as input and returns an optional integer encoding. If the encoder returns None for a tag, it will be skipped.

required

Returns:

Name Type Description
encoded ndarray

A numpy array of floats of shape (num_classes,) representing the predicted scores for each class. The array contains the scores for each class corresponding to the input predicted tags.

Examples:

Consider the following set of tags:

>>> dog = data.Tag(key="animal", value="dog")
>>> cat = data.Tag(key="animal", value="cat")
>>> brown = data.Tag(key="color", value="brown")
>>> blue = data.Tag(key="color", value="blue")

And we are only interested in encoding the following two classes:

>>> encoder = create_tag_encoder([dog, brown])

Then the following examples show how the encoding works for predicted tags:

>>> prediction_encoding([data.PredictedTag(tag=brown, score=0.5)], encoder)
array([0. , 0.5]...)
>>> prediction_encoding(
...     [
...         data.PredictedTag(tag=dog, score=0.2),
...         data.PredictedTag(tag=blue, score=0.9),
...     ],
...     encoder,
... )
array([0.2, 0. ]...)
>>> prediction_encoding(
...     [
...         data.PredictedTag(tag=dog, score=0.2),
...         data.PredictedTag(tag=brown, score=0.5),
...     ],
...     encoder,
... )
array([0.2, 0.5]...)
>>> prediction_encoding(
...     [
...         data.PredictedTag(tag=cat, score=0.7),
...     ],
...     encoder,
... )
array([0., 0.]...)