Skip to content

Sound Event Detection#

Sound Event Detection (SED) is the task of identifying the presence of sound events in an audio recording, estimating their temporal positions (start and end times), and classifying them into predefined categories.

soundevent.evaluation.tasks.sound_event_detection #

Sound event detection evaluation.

Functions:

Name Description
evaluate_clip
evaluate_sound_event_detection

Evaluate sound event detections against ground truth annotations.

sound_event_detection

Attributes#

EXAMPLE_METRICS = () module-attribute #

RUN_METRICS = ((terms.mean_average_precision, metrics.mean_average_precision), (terms.balanced_accuracy, metrics.balanced_accuracy), (terms.accuracy, metrics.accuracy), (terms.top_3_accuracy, metrics.top_3_accuracy)) module-attribute #

SOUNDEVENT_METRICS = ((terms.true_class_probability, metrics.true_class_probability),) module-attribute #

Classes#

ClipPrediction #

Bases: Protocol, Generic[Detection]

Protocol defining the requirements for a clip prediction object.

Attributes:

Name Type Description
clip Clip
detections Sequence[Detection]
Attributes#
clip instance-attribute #
detections instance-attribute #

Functions#

compute_overall_metrics(true_classes, predicted_classes_scores) #

Compute evaluation metrics based on true classes and predicted scores.

evaluate_clip(clip_annotations, clip_predictions, encoder) #

evaluate_sound_event(sound_event_prediction, sound_event_annotation, encoder) #

evaluate_sound_event_detection(clip_predictions, clip_annotations, affinity, score=None, affinity_threshold=0, strict_match=False) #

Evaluate sound event detections against ground truth annotations.

This function matches predictions to annotations for each clip individually.

Parameters:

Name Type Description Default
clip_predictions Sequence[ClipPrediction[Detection]]

A sequence of prediction objects. Each object must contain a reference to the clip and a sequence of detections.

required
clip_annotations Sequence[ClipAnnotation]

A sequence of ground truth annotations corresponding to the same clips.

required
affinity Callable[[Detection, SoundEventAnnotation], float]

A function that computes the affinity score (e.g., IoU) between a detection and a ground truth annotation.

required
score Callable[[Detection], float] | None

A function to extract the confidence score from a detection. Used to sort detections greedily. If None, detections are processed in the order provided.

None
affinity_threshold float

The minimum affinity score required for a valid match. Matches with scores less than or equal to this value are discarded. Defaults to 0.0.

0
strict_match bool

If True, a detection is only matched if its highest affinity target is available. If False (default), it falls back to the next best available target.

False

Yields:

Name Type Description
clip Clip

The clip associated with the match.

match Match[Detection, SoundEventAnnotation]

A named tuple containing the matching results, see [Match][].

Raises:

Type Description
ValueError

If the number of predictions and annotations differs, or if the sets of clip UUIDs do not match exactly.

sound_event_detection(clip_predictions, clip_annotations, tags) #