Evaluation Module#
Additional dependencies
To use the soundevent.evaluation
module you need to install some
additional dependencies. Make sure you have them installed by running the
following command:
soundevent.evaluation
#
Evaluation functions.
Modules:
Name | Description |
---|---|
affinity |
Measures of affinity between sound events geometries. |
clip_classification |
|
clip_multilabel_classification |
|
encoding |
Tag Encoder Module. |
match |
Algorithms for matching geometries. |
metrics |
|
sound_event_classification |
Sound event classification evaluation. |
sound_event_detection |
Sound event detection evaluation. |
tasks |
|
Functions:
Name | Description |
---|---|
classification_encoding |
Encode a list of tags into an integer value. |
compute_affinity |
Compute the geometric affinity between two geometries. |
create_tag_encoder |
Create an encoder object from a list of tags. |
match_geometries |
Match geometries between a source and a target sequence. |
multilabel_encoding |
Encode a list of tags into a binary multilabel array. |
prediction_encoding |
Encode a list of predicted tags into a floating-point array of scores. |
Functions#
classification_encoding(tags, encoder)
#
Encode a list of tags into an integer value.
This function is commonly used for mapping a list of tags to a compact integer representation, typically representing classes associated with objects like clips or sound events.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tags
|
Sequence[Tag]
|
A list of tags to be encoded. |
required |
encoder
|
Callable[[Tag], Optional[int]]
|
A callable object that takes a data.Tag object as input and returns an optional integer encoding. If the encoder returns None for a tag, it will be skipped. |
required |
Returns:
Name | Type | Description |
---|---|---|
encoded |
Optional[int]
|
The encoded integer value representing the tags, or None if no encoding is available. |
Examples:
Consider the following set of tags:
>>> dog = data.Tag(key="animal", value="dog")
>>> cat = data.Tag(key="animal", value="cat")
>>> brown = data.Tag(key="color", value="brown")
>>> blue = data.Tag(key="color", value="blue")
If we are interested in encoding only the 'dog' and 'brown' classes, the following examples demonstrate how the encoding works for tag list:
compute_affinity(geometry1, geometry2, time_buffer=0.01, freq_buffer=100)
#
Compute the geometric affinity between two geometries.
This function calculates the geometric similarity between two geometries, which is a measure of how much they overlap. The affinity is computed as the Intersection over Union (IoU).
Intersection over Union (IoU)
IoU is a standard metric for comparing the similarity between two shapes. It is calculated as the ratio of the area of the overlap between the two geometries to the area of their combined shape.
.. math::
\text{IoU} = \frac{\text{Area of Overlap}}{\text{Area of Union}}
An IoU of 1 means the geometries are identical, while an IoU of 0 means they do not overlap at all. This is particularly useful in bioacoustics for comparing annotations or predictions of sound events in a time-frequency representation (spectrogram).
To account for small variations in annotations, a buffer can be added to
each geometry before computing the IoU. This is controlled by the
time_buffer
and freq_buffer
parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
geometry1
|
Geometry
|
The first geometry to be compared. |
required |
geometry2
|
Geometry
|
The second geometry to be compared. |
required |
time_buffer
|
float
|
Time buffer in seconds added to each geometry. Default is 0.01. |
0.01
|
freq_buffer
|
float
|
Frequency buffer in Hertz added to each geometry. Default is 100. |
100
|
Returns:
Name | Type | Description |
---|---|---|
affinity |
float
|
The Intersection over Union (IoU) score, a value between 0 and 1 indicating the degree of overlap. |
Notes
- If either input geometry is of a time-based type, a specialized time-based affinity calculation is performed.
Examples:
create_tag_encoder(tags)
#
match_geometries(source, target, time_buffer=0.01, freq_buffer=100, affinity_threshold=0)
#
Match geometries between a source and a target sequence.
The matching is performed by first computing an affinity matrix between
all pairs of source and target geometries. The affinity is a measure of
similarity, calculated as the Intersection over Union (IoU). For more
details on how affinity is computed, see
soundevent.evaluation.affinity.compute_affinity
.
The affinity calculation is influenced by the time_buffer
and
freq_buffer
parameters, which add a buffer to each geometry before
comparison. This can help account for small variations in annotations.
Once the affinity matrix is computed, the Hungarian algorithm (via
scipy.optimize.linear_sum_assignment
) is used to find an optimal
assignment of source to target geometries that maximizes the total
affinity.
Finally, matches with an affinity below affinity_threshold
are
discarded and considered as unmatched.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source
|
Sequence[Geometry]
|
The source geometries to match. |
required |
target
|
Sequence[Geometry]
|
The target geometries to match. |
required |
time_buffer
|
float
|
A buffer in seconds added to each geometry when computing affinity.
See
|
0.01
|
freq_buffer
|
float
|
A buffer in Hertz added to each geometry when computing affinity.
See
|
100
|
affinity_threshold
|
float
|
The minimum affinity (IoU) for a pair of geometries to be considered a match. Pairs with affinity below this value are considered unmatched, by default 0. |
0
|
Returns:
Type | Description |
---|---|
Iterable[Tuple[Optional[int], Optional[int], float]]
|
An iterable of matching results. Each source and target geometry is accounted for exactly once in the output. Each tuple can be one of:
|
multilabel_encoding(tags, encoder)
#
Encode a list of tags into a binary multilabel array.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tags
|
Sequence[Tag]
|
A list of tags to be encoded. |
required |
encoder
|
Encoder
|
A callable object that takes a data.Tag object as input and returns an optional integer encoding. If the encoder returns None for a tag, it will be skipped. |
required |
Returns:
Name | Type | Description |
---|---|---|
encoded |
ndarray
|
A binary numpy array of shape (num_classes,) representing the multilabel encoding for the input tags. Each index with a corresponding tag is set to 1, and the rest are 0. |
Examples:
Consider the following set of tags:
>>> dog = data.Tag(key="animal", value="dog")
>>> cat = data.Tag(key="animal", value="cat")
>>> brown = data.Tag(key="color", value="brown")
>>> blue = data.Tag(key="color", value="blue")
And we are only interested in encoding the following two classes:
Then the following examples show how the multilabel encoding works:
prediction_encoding(tags, encoder)
#
Encode a list of predicted tags into a floating-point array of scores.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tags
|
Sequence[PredictedTag]
|
A list of predicted tags to be encoded. |
required |
encoder
|
Encoder
|
A callable object that takes a data.Tag object as input and returns an optional integer encoding. If the encoder returns None for a tag, it will be skipped. |
required |
Returns:
Name | Type | Description |
---|---|---|
encoded |
ndarray
|
A numpy array of floats of shape (num_classes,) representing the predicted scores for each class. The array contains the scores for each class corresponding to the input predicted tags. |
Examples:
Consider the following set of tags:
>>> dog = data.Tag(key="animal", value="dog")
>>> cat = data.Tag(key="animal", value="cat")
>>> brown = data.Tag(key="color", value="brown")
>>> blue = data.Tag(key="color", value="blue")
And we are only interested in encoding the following two classes:
Then the following examples show how the encoding works for predicted tags:
>>> prediction_encoding([data.PredictedTag(tag=brown, score=0.5)], encoder)
array([0. , 0.5]...)
>>> prediction_encoding(
... [
... data.PredictedTag(tag=dog, score=0.2),
... data.PredictedTag(tag=blue, score=0.9),
... ],
... encoder,
... )
array([0.2, 0. ]...)
>>> prediction_encoding(
... [
... data.PredictedTag(tag=dog, score=0.2),
... data.PredictedTag(tag=brown, score=0.5),
... ],
... encoder,
... )
array([0.2, 0.5]...)
>>> prediction_encoding(
... [
... data.PredictedTag(tag=cat, score=0.7),
... ],
... encoder,
... )
array([0., 0.]...)