Prediction#
While Annotation involves expert human interpretation, often considered as
ground truth, in soundevent
, we use the term Prediction to denote
interpretations derived through uncertain means, involving some level of
guesswork or uncertainty. These interpretations could be the outputs of
algorithms designed to automate the annotation task or even other individuals
providing speculative insights into the "truth" of the sounds they are
analyzing.
In soundevent
, we introduce objects designed for discussing predictions:
SoundEventPrediction, SequencePrediction, ClipPrediction. These
entities store predictions that aim to estimate the true state of their
corresponding entities. All prediction objects share a similar structure,
providing a measure of uncertainty by storing an overall confidence score.
They also incorporate PredictedTags, resembling regular tags but with an
additional confidence score assigned to each tag.
Sound Event Predictions#
The SoundEventPrediction object represents a single sound event predicted through uncertain means. This object contains information about the predicted sound event, including an overall confidence score, and a list of predicted tags describing the anticipated characteristics of the sound event.
erDiagram
SoundEventPrediction {
UUID uuid
float score
}
SoundEvent
PredictedTag {
float score
}
Tag
SoundEventPrediction ||--|| SoundEvent : soundevent
SoundEventPrediction ||--o{ PredictedTag : tags
PredictedTag }|--|| Tag : tag
Sequence Predictions#
The SequencePrediction object represents a predicted sequence of sound events. Much like Sound Event Predictions, it encapsulates information about the sequence, offering an overall confidence score for the prediction and a list of PredictedTags.
erDiagram
SequencePrediction {
UUID uuid
float score
}
Sequence
PredictedTag {
float score
}
Tag
SequencePrediction ||--|| Sequence : sequence
SequencePrediction ||--o{ PredictedTag : tags
PredictedTag }|--|| Tag : tag
Clip Predictions#
The ClipPrediction object encompasses all predictions made for the entire Clip, incorporating predictions for sound events or sequences. Like the preceding predictions, it retains fields for the overall confidence score and the predicted tags.
Interpreting predicted tags at the clip level is straightforward—they are tags that apply to the entire acoustic content, proving beneficial for tasks like sound scene classification. However, the interpretation of the score field differs from sound event or sequence predictions. Rather than providing an overall score for the confidence of sound event or sequence presence, the clip score can serve to encode the confidence of a binary classification problem. A low score indicates that the clip would not be considered a positive example in the binary classification problem. It's important to note that the utilization of scores to encode the necessary information is entirely at the discretion of the user.
erDiagram
ClipPrediction {
UUID uuid
float score
}
Clip
PredictedTag {
float score
}
Tag
SoundEventPrediction
SequencePrediction
ClipPrediction }|--|| Clip : clip
ClipPrediction ||--o{ PredictedTag : tags
ClipPrediction ||--o{ SoundEventPrediction : sound_events
ClipPrediction ||--o{ SequencePrediction : sequences
PredictedTag }|--|| Tag : tag
Model Runs#
The ModelRun object in soundevent
serves to
store collections of predictions originating from the same source. This object
includes a set of predictions and provides details such as the model's name,
an optional version (for precise method tracking), and a description of the
method employed.
erDiagram
ModelRun {
UUID uuid
datetime created_on
str name
str version
str description
}
ClipPrediction
ModelRun ||--|{ ClipPrediction : clip_predictions