Skip to content

Data Schemas#

Welcome to the data schemas tour with the soundevent package! In this overview, we'll break down the various data schemas provided by the package into the following sections:

Describing the Data#

soundevent provides tools to attach essential information to various objects in bioacoustic analysis:

  • Users: Keeping reference of everyone's contribution.
  • Terms: Standardized vocabularies ensure consistent language.
  • Tags: Attaching semantic context to objects.
  • Features: Numerical descriptors capturing continuously varying attributes.
  • Notes: User-written free-text annotations.

Audio Content#

At the core of acoustic analysis, we have schemas for:

  • Recordings: Complete audio files.
  • Dataset: A collection of recordings from a common source.

Acoustic Objects#

Identifying distinctive sound elements within audio content:

  • Geometric Objects: Defining Regions of Interest (RoI) in the temporal-frequency plane.
  • Sound Events: Individual sonic occurrences.
  • Sequences: Patterns of connected sound events.
  • Clips: Fragments extracted from recordings.

Annotation#

soundevent places emphasis on human annotation processes, covering:

Prediction#

Automated processing methods also play a role, generating:

Evaluation#

Assessing the accuracy of predictions is crucial, and soundevent provides schemas for:

  • Matches: Predicted sound events overlapping with ground truth.
  • Clip Evaluation: Information about matches and performance metrics at the clip level.
  • Evaluation: Comprehensive details on model performance across the entire evaluation set.
  • Evaluation Set: Human annotations serving as ground truth.

Want to know more? Dive in for a closer look at each of these schemas.

Unique Identifiers

In soundevent, various objects feature a field called uuid. This field stores a Universal Unique Identifier (UUID), a 128-bit label generated automatically. When created following standard methods, UUIDs are practically unique. Information labeled with UUIDs by different parties can be combined into a unified database without fear of duplication.