Skip to content

Evaluation Module#

Additional dependencies

To use the soundevent.evaluation module you need to install some additional dependencies. Make sure you have them installed by running the following command:

pip install soundevent[evaluation]

soundevent.evaluation #

Evaluation functions.

Modules:

Name Description
affinity

Measures of affinity between sound events geometries.

clip_classification
clip_multilabel_classification
encoding

Tag Encoder Module.

match

Algorithms for matching geometries.

metrics
sound_event_classification

Sound event classification evaluation.

sound_event_detection

Sound event detection evaluation.

tasks

Functions:

Name Description
classification_encoding

Encode a list of tags into an integer value.

compute_affinity

Compute the geometric affinity between two geometries.

create_tag_encoder

Create an encoder object from a list of tags.

match_geometries

Match geometries between a source and a target sequence.

multilabel_encoding

Encode a list of tags into a binary multilabel array.

prediction_encoding

Encode a list of predicted tags into a floating-point array of scores.

Functions#

classification_encoding(tags, encoder) #

Encode a list of tags into an integer value.

This function is commonly used for mapping a list of tags to a compact integer representation, typically representing classes associated with objects like clips or sound events.

Parameters:

Name Type Description Default
tags Sequence[Tag]

A list of tags to be encoded.

required
encoder Callable[[Tag], Optional[int]]

A callable object that takes a data.Tag object as input and returns an optional integer encoding. If the encoder returns None for a tag, it will be skipped.

required

Returns:

Name Type Description
encoded Optional[int]

The encoded integer value representing the tags, or None if no encoding is available.

Examples:

Consider the following set of tags:

>>> dog = data.Tag(key="animal", value="dog")
>>> cat = data.Tag(key="animal", value="cat")
>>> brown = data.Tag(key="color", value="brown")
>>> blue = data.Tag(key="color", value="blue")

If we are interested in encoding only the 'dog' and 'brown' classes, the following examples demonstrate how the encoding works for tag list:

>>> encoder = create_tag_encoder([dog, brown])
>>> classification_encoding([brown], encoder)
1
>>> classification_encoding([dog, blue], encoder)
0
>>> classification_encoding([dog, brown], encoder)
0
>>> classification_encoding([cat], encoder)

compute_affinity(geometry1, geometry2, time_buffer=0.01, freq_buffer=100) #

Compute the geometric affinity between two geometries.

This function calculates the geometric similarity between two geometries, which is a measure of how much they overlap. The affinity is computed as the Intersection over Union (IoU).

Intersection over Union (IoU)

IoU is a standard metric for comparing the similarity between two shapes. It is calculated as the ratio of the area of the overlap between the two geometries to the area of their combined shape.

.. math::

\text{IoU} = \frac{\text{Area of Overlap}}{\text{Area of Union}}

An IoU of 1 means the geometries are identical, while an IoU of 0 means they do not overlap at all. This is particularly useful in bioacoustics for comparing annotations or predictions of sound events in a time-frequency representation (spectrogram).

To account for small variations in annotations, a buffer can be added to each geometry before computing the IoU. This is controlled by the time_buffer and freq_buffer parameters.

Parameters:

Name Type Description Default
geometry1 Geometry

The first geometry to be compared.

required
geometry2 Geometry

The second geometry to be compared.

required
time_buffer float

Time buffer in seconds added to each geometry. Default is 0.01.

0.01
freq_buffer float

Frequency buffer in Hertz added to each geometry. Default is 100.

100

Returns:

Name Type Description
affinity float

The Intersection over Union (IoU) score, a value between 0 and 1 indicating the degree of overlap.

Notes
  • If either input geometry is of a time-based type, a specialized time-based affinity calculation is performed.

Examples:

>>> geometry1 = data.BoundingBox(coordinates=[0.4, 2000, 0.6, 8000])
>>> geometry2 = data.BoundingBox(coordinates=[0.5, 5000, 0.7, 6000])
>>> affinity = compute_affinity(
...     geometry1,
...     geometry2,
...     time_buffer=0.02,
...     freq_buffer=150,
... )
>>> print(round(affinity, 3))
0.111

create_tag_encoder(tags) #

Create an encoder object from a list of tags.

Parameters:

Name Type Description Default
tags Sequence[Tag]

A list of tags to be encoded.

required

Returns:

Type Description
SimpleEncoder

An instance of SimpleEncoder initialized with the provided tags.

match_geometries(source, target, time_buffer=0.01, freq_buffer=100, affinity_threshold=0) #

Match geometries between a source and a target sequence.

The matching is performed by first computing an affinity matrix between all pairs of source and target geometries. The affinity is a measure of similarity, calculated as the Intersection over Union (IoU). For more details on how affinity is computed, see soundevent.evaluation.affinity.compute_affinity.

The affinity calculation is influenced by the time_buffer and freq_buffer parameters, which add a buffer to each geometry before comparison. This can help account for small variations in annotations.

Once the affinity matrix is computed, the Hungarian algorithm (via scipy.optimize.linear_sum_assignment) is used to find an optimal assignment of source to target geometries that maximizes the total affinity.

Finally, matches with an affinity below affinity_threshold are discarded and considered as unmatched.

Parameters:

Name Type Description Default
source Sequence[Geometry]

The source geometries to match.

required
target Sequence[Geometry]

The target geometries to match.

required
time_buffer float

A buffer in seconds added to each geometry when computing affinity. See soundevent.evaluation.affinity.compute_affinity for more details. Defaults to 0.01.

0.01
freq_buffer float

A buffer in Hertz added to each geometry when computing affinity. See soundevent.evaluation.affinity.compute_affinity for more details. Defaults to 100.

100
affinity_threshold float

The minimum affinity (IoU) for a pair of geometries to be considered a match. Pairs with affinity below this value are considered unmatched, by default 0.

0

Returns:

Type Description
Iterable[Tuple[Optional[int], Optional[int], float]]

An iterable of matching results. Each source and target geometry is accounted for exactly once in the output. Each tuple can be one of:

  • (source_index, target_index, affinity): A successful match between a source and a target geometry with an affinity score.
  • (source_index, None, 0): An unmatched source geometry.
  • (None, target_index, 0): An unmatched target geometry.

multilabel_encoding(tags, encoder) #

Encode a list of tags into a binary multilabel array.

Parameters:

Name Type Description Default
tags Sequence[Tag]

A list of tags to be encoded.

required
encoder Encoder

A callable object that takes a data.Tag object as input and returns an optional integer encoding. If the encoder returns None for a tag, it will be skipped.

required

Returns:

Name Type Description
encoded ndarray

A binary numpy array of shape (num_classes,) representing the multilabel encoding for the input tags. Each index with a corresponding tag is set to 1, and the rest are 0.

Examples:

Consider the following set of tags:

>>> dog = data.Tag(key="animal", value="dog")
>>> cat = data.Tag(key="animal", value="cat")
>>> brown = data.Tag(key="color", value="brown")
>>> blue = data.Tag(key="color", value="blue")

And we are only interested in encoding the following two classes:

>>> encoder = create_tag_encoder([dog, brown])

Then the following examples show how the multilabel encoding works:

>>> multilabel_encoding([brown], encoder)
array([0, 1]...)
>>> multilabel_encoding([dog, blue], encoder)
array([1, 0]...)
>>> multilabel_encoding([dog, brown], encoder)
array([1, 1]...)
>>> multilabel_encoding([cat], encoder)
array([0, 0]...)

prediction_encoding(tags, encoder) #

Encode a list of predicted tags into a floating-point array of scores.

Parameters:

Name Type Description Default
tags Sequence[PredictedTag]

A list of predicted tags to be encoded.

required
encoder Encoder

A callable object that takes a data.Tag object as input and returns an optional integer encoding. If the encoder returns None for a tag, it will be skipped.

required

Returns:

Name Type Description
encoded ndarray

A numpy array of floats of shape (num_classes,) representing the predicted scores for each class. The array contains the scores for each class corresponding to the input predicted tags.

Examples:

Consider the following set of tags:

>>> dog = data.Tag(key="animal", value="dog")
>>> cat = data.Tag(key="animal", value="cat")
>>> brown = data.Tag(key="color", value="brown")
>>> blue = data.Tag(key="color", value="blue")

And we are only interested in encoding the following two classes:

>>> encoder = create_tag_encoder([dog, brown])

Then the following examples show how the encoding works for predicted tags:

>>> prediction_encoding([data.PredictedTag(tag=brown, score=0.5)], encoder)
array([0. , 0.5]...)
>>> prediction_encoding(
...     [
...         data.PredictedTag(tag=dog, score=0.2),
...         data.PredictedTag(tag=blue, score=0.9),
...     ],
...     encoder,
... )
array([0.2, 0. ]...)
>>> prediction_encoding(
...     [
...         data.PredictedTag(tag=dog, score=0.2),
...         data.PredictedTag(tag=brown, score=0.5),
...     ],
...     encoder,
... )
array([0.2, 0.5]...)
>>> prediction_encoding(
...     [
...         data.PredictedTag(tag=cat, score=0.7),
...     ],
...     encoder,
... )
array([0., 0.]...)