Prediction#

While Annotation involves expert human interpretation, often considered as ground truth, in soundevent, we use the term Prediction to denote interpretations derived through uncertain means, involving some level of guesswork or uncertainty. These interpretations could be the outputs of algorithms designed to automate the annotation task or even other individuals providing speculative insights into the "truth" of the sounds they are analyzing.

In soundevent, we introduce objects designed for discussing predictions: SoundEventPrediction, SequencePrediction, ClipPrediction. These entities store predictions that aim to estimate the true state of their corresponding entities. All prediction objects share a similar structure, providing a measure of uncertainty by storing an overall confidence score. They also incorporate PredictedTags, resembling regular tags but with an additional confidence score assigned to each tag.

Sound Event Predictions#

The SoundEventPrediction object represents a single sound event predicted through uncertain means. This object contains information about the predicted sound event, including an overall confidence score, and a list of predicted tags describing the anticipated characteristics of the sound event.

erDiagram
    SoundEventPrediction {
        UUID uuid
        float score
    }
    SoundEvent
    PredictedTag {
        float score
    }
    Tag
    SoundEventPrediction ||--|| SoundEvent : soundevent
    SoundEventPrediction ||--o{ PredictedTag : tags
    PredictedTag }|--|| Tag : tag

Sequence Predictions#

The SequencePrediction object represents a predicted sequence of sound events. Much like Sound Event Predictions, it encapsulates information about the sequence, offering an overall confidence score for the prediction and a list of PredictedTags.

erDiagram
    SequencePrediction {
        UUID uuid
        float score
    }
    Sequence
    PredictedTag {
        float score
    }
    Tag
    SequencePrediction ||--|| Sequence : sequence
    SequencePrediction ||--o{ PredictedTag : tags
    PredictedTag }|--|| Tag : tag

Clip Predictions#

The ClipPrediction object encompasses all predictions made for the entire Clip, incorporating predictions for sound events or sequences. Like the preceding predictions, it retains fields for the overall confidence score and the predicted tags.

Interpreting predicted tags at the clip level is straightforward—they are tags that apply to the entire acoustic content, proving beneficial for tasks like sound scene classification. However, the interpretation of the score field differs from sound event or sequence predictions. Rather than providing an overall score for the confidence of sound event or sequence presence, the clip score can serve to encode the confidence of a binary classification problem. A low score indicates that the clip would not be considered a positive example in the binary classification problem. It's important to note that the utilization of scores to encode the necessary information is entirely at the discretion of the user.

erDiagram
    ClipPrediction {
        UUID uuid
        float score
    }
    Clip
    PredictedTag {
        float score
    }
    Tag
    SoundEventPrediction
    SequencePrediction
    ClipPrediction }|--|| Clip : clip
    ClipPrediction ||--o{ PredictedTag : tags
    ClipPrediction ||--o{ SoundEventPrediction : sound_events
    ClipPrediction ||--o{ SequencePrediction : sequences
    PredictedTag }|--|| Tag : tag

Model Runs#

The ModelRun object in soundevent serves to store collections of predictions originating from the same source. This object includes a set of predictions and provides details such as the model's name, an optional version (for precise method tracking), and a description of the method employed.

erDiagram
    ModelRun {
        UUID uuid
        datetime created_on
        str name
        str version
        str description
    }
    ClipPrediction
    ModelRun ||--|{ ClipPrediction : clip_predictions