Skip to content

Acoustic Objects#

Now, we'll explore how soundevent handles various "objects" within the acoustic content. To begin, Geometry objects offer a means to define regions of interest in the time-frequency plane. Sound events, as acoustic objects within the audio, are characterized by a geometry that delineates their location. Additionally, soundevent introduces a method for specifying Sequences of these Sound events. Let's delve deeper into these concepts:

Geometries#

In soundevent, Geometry objects are essential for defining precise regions of interest in the time-frequency plane. The package offers various geometry types, providing flexibility in delineating regions of interest:

  • TimeStamp: Represents a single point in time.

  • TimeInterval: Describes a time interval, specifying both starting and ending times.

  • Point: Pinpoints the exact location in time and frequency.

  • LineString: Describes an unbroken (potentially non-straight) line through a sequence of points.

  • Polygon: Defines a closed shape in time and frequency, possibly with holes.

  • BoundingBox: Represents a rectangle in time and frequency, offered separately due to its common use.

  • MultiPoint: Describes a collection of points.

  • MultiLineString: A collection of line strings.

  • MultiPolygon: A collection of polygons, useful for demarcating regions of interest that are interrupted by occluding sounds.

Important Note on Time and Frequency Units

Pay careful attention to the units used in soundevent geometries. Time is uniformly measured in seconds, and frequency is represented in hertz. It's crucial to emphasize that all time values are consistently referenced relative to the start of the recording. Adhering to these standardized units ensures smoother development of functions and interaction with geometry objects, built on reliable assumptions.

Understanding Geometry Objects in soundevent

soundevent adheres to the GeoJSON specification for structuring geometry objects. Every geometry object comprises a type field, indicating its specific geometry type, and a coordinates field defining its geometric properties. For further details and specifications, refer to the GeoJSON Specification.

Sound Events#

Sound Events take the spotlight in this package, serving as the key players representing distinct sounds within the audio content. These "events" unfold within specific time intervals and frequency ranges, and the soundevent package ensures their precise localization using handy Geometry objects.

Adding a layer of richness to these Sound Events is the ability to characterize them through Features. These Features provide quantitative insights into various acoustic properties, ranging from the basics like duration, bandwidth, to peak frequency. You can attach any feature you fancy, including those extracted by Deep Learning models!

erDiagram
    SoundEvent {
        UUID uuid
    }
    Geometry
    Feature
    SoundEvent ||--|| Geometry : geometry
    SoundEvent ||--o{ Feature : features

Sequences#

A Sequence in soundevent is essentially a collection of Sound Events, providing a flexible modeling tool that groups multiple Sound Events with a unifying relation. Researchers have the freedom to determine which Sound Events constitute a sequence, allowing them to customize the structure based on their specific research requirements. The Sequence object can specify a parent Sequence, supporting hierarchical arrangements, enabling the inclusion of subsequences and providing a comprehensive representation of intricate relationships within complex sequences. Similar to Sound Events, Sequences can be described using Features, offering numerical insights into their acoustic properties.

erDiagram
    Sequence {
        UUID uuid
    }
    SoundEvent
    Feature
    Sequence }|--|{ SoundEvent : sound_events
    Sequence ||--o{ Feature : features
    Sequence }|--o| Sequence : parent

Clips#

Clips in soundevent represent distinct fragments of a Recording, delineated by their start and end times. Serving as fundamental units for analysis and annotation tasks, Clips offer a more computationally efficient approach, particularly when working with lengthy audio files. Breaking down the Recording into manageable Clips not only enhances computational efficiency but also supports focused analysis on specific segments of interest. Standardizing Clip durations ensures consistency in annotations across diverse Recordings and facilitates easier interpretation and comparison of results in audio data. Many machine learning models process audio files in Clips, reinforcing the practical adoption of the Clip structure.

The exploration of a Clip's content is facilitated through Features attached to the Clip, providing numerical descriptors of its acoustic content. These features can vary widely, encompassing Acoustic Indices, simple descriptors of overall acoustic information, or even abstract features derived from Deep Learning models. You have the freedom to choose features that align with the specific requirements of your work, making it a flexible and customizable aspect of your analysis.

erDiagram
    Clip {
        UUID uuid
        float start_time
        float end_time
    }
    Recording
    Feature
    Clip }|--|| Recording : recording
    Clip }|--o{ Feature : features
Understanding the Distinction between Clips and TimeInterval Sound Events

While both Clip objects and TimeInterval Sound Events share a common feature of being defined by a specific start and end time, their purpose and usage significantly differ. Generally, TimeInterval Sound Events are designed to emphasize a segment of the audio content that corresponds to a distinct and cohesive sound. In contrast, Clips have no such restriction; they represent a subset of a recording without a specific reference to a single sound event. Clips are typically considered to encapsulate the entire acoustic content, acknowledging that they may contain multiple sound events or none at all. Therefore, discussions about clips generally revolve around the entirety of the acoustic material rather than focusing on a particular sound instance.