Sound Event Detection#
Sound Event Detection (SED) is the task of identifying the presence of sound events in an audio recording, estimating their temporal positions (start and end times), and classifying them into predefined categories.
soundevent.evaluation.tasks.sound_event_detection
#
Sound event detection evaluation.
Functions:
| Name | Description |
|---|---|
evaluate_clip |
|
evaluate_sound_event_detection |
Evaluate sound event detections against ground truth annotations. |
sound_event_detection |
|
Attributes#
EXAMPLE_METRICS = ()
module-attribute
#
RUN_METRICS = ((terms.mean_average_precision, metrics.mean_average_precision), (terms.balanced_accuracy, metrics.balanced_accuracy), (terms.accuracy, metrics.accuracy), (terms.top_3_accuracy, metrics.top_3_accuracy))
module-attribute
#
SOUNDEVENT_METRICS = ((terms.true_class_probability, metrics.true_class_probability),)
module-attribute
#
Classes#
ClipPrediction
#
Functions#
compute_overall_metrics(true_classes, predicted_classes_scores)
#
Compute evaluation metrics based on true classes and predicted scores.
evaluate_clip(clip_annotations, clip_predictions, encoder)
#
evaluate_sound_event(sound_event_prediction, sound_event_annotation, encoder)
#
evaluate_sound_event_detection(clip_predictions, clip_annotations, affinity, score=None, affinity_threshold=0, strict_match=False)
#
Evaluate sound event detections against ground truth annotations.
This function matches predictions to annotations for each clip individually.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
clip_predictions
|
Sequence[ClipPrediction[Detection]]
|
A sequence of prediction objects. Each object must contain a reference to the clip and a sequence of detections. |
required |
clip_annotations
|
Sequence[ClipAnnotation]
|
A sequence of ground truth annotations corresponding to the same clips. |
required |
affinity
|
Callable[[Detection, SoundEventAnnotation], float]
|
A function that computes the affinity score (e.g., IoU) between a detection and a ground truth annotation. |
required |
score
|
Callable[[Detection], float] | None
|
A function to extract the confidence score from a detection. Used to sort detections greedily. If None, detections are processed in the order provided. |
None
|
affinity_threshold
|
float
|
The minimum affinity score required for a valid match. Matches with scores less than or equal to this value are discarded. Defaults to 0.0. |
0
|
strict_match
|
bool
|
If True, a detection is only matched if its highest affinity target is available. If False (default), it falls back to the next best available target. |
False
|
Yields:
| Name | Type | Description |
|---|---|---|
clip |
Clip
|
The clip associated with the match. |
match |
Match[Detection, SoundEventAnnotation]
|
A named tuple containing the matching results, see [ |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the number of predictions and annotations differs, or if the sets of clip UUIDs do not match exactly. |