Note
Click here to download the full example code
Integrating Crowsetta with Soundevent#
Crowsetta is a versatile Python
tool designed for handling annotations of animal vocalizations and bioacoustics
data. If you're working with diverse annotation formats Crowsetta has you
covered. Soundevent complements this functionality with its
soundevent.io.crowsetta
module, offering a convenient solution for converting
between Crowsetta and Soundevent formats.
In this guide, we'll walk through the process of using Crowsetta to load
annotations and then converting them to Soundevent format using the
soundevent.io.crowsetta
module.
Usage details
To use the soundevent.io.crowsetta
module you need to install some
additional dependencies. You can do this by running the following
command:
Loading annotations with crowsetta
#
To begin our journey, let's delve into loading annotations using Crowsetta.
Crowsetta supported formats#
Crowsetta offers support for various annotation formats. Let's explore the available formats:
Out:
['aud-txt', 'birdsong-recognition-dataset', 'generic-seq', 'notmat', 'raven', 'simple-seq', 'textgrid', 'timit']
Loading Example Raven Annotations#
Let's walk through the process of loading example Raven annotations using Crowsetta.
import os
import tempfile
with tempfile.TemporaryDirectory() as tmpdirname:
# Extract the example data files to a temporary directory
data_dir = os.path.join(tmpdirname, "crowsetta_data")
crowsetta.data.extract_data_files(user_data_dir=data_dir)
# Select a Raven example file
example_file = crowsetta.data.get("raven", user_data_dir=data_dir)
# Create a Raven transcriber
transcriber = crowsetta.Transcriber("raven")
# Load the Raven annotations
# For this example, we assume the annotations correspond to a test audio
# file.
raven_annotations = transcriber.from_file(
example_file.annot_path,
annot_col="Species",
audio_path="sample_audio.wav",
)
print(raven_annotations)
# Convert the annotations to the standard crowsetta format
annotations = raven_annotations.to_annot()
print(f"Citation: {example_file.citation}")
print(f"Loaded {len(annotations.bboxes)} bounding box annotations")
print("Notated file: ", annotations.notated_path)
Out:
Raven(df= Selection View Channel ... low_freq_hz high_freq_hz annotation
0 1 Spectrogram 1 1 ... 2878.2 4049.0 EATO
1 2 Spectrogram 1 1 ... 2731.9 3902.7 EATO
2 3 Spectrogram 1 1 ... 2878.2 3975.8 EATO
3 4 Spectrogram 1 1 ... 2756.2 3951.4 EATO
4 5 Spectrogram 1 1 ... 2707.5 3975.8 EATO
5 6 Spectrogram 1 1 ... 2951.4 3975.8 EATO
[6 rows x 8 columns], annot_path=PosixPath('/tmp/tmp609k77z1/crowsetta_data/raven/Recording_1_Segment_02.Table.1.selections.txt'), annot_col='Species', audio_path=PosixPath('sample_audio.wav'))
Citation: Chronister, L. M., Rhinehart, T. A., Place, A., & Kitzes, J. (2021). An annotated set of audio recordings of Eastern North American birds containing frequency, time, and species information.https://datadryad.org/stash/dataset/doi:10.5061/dryad.d2547d81z
Loaded 6 bounding box annotations
Notated file: sample_audio.wav
Converting to Soundevent format#
Having successfully loaded the annotations using Crowsetta, we're now ready to convert them to Soundevent format.
import soundevent.io.crowsetta as cr
# Convert Crowsetta Annotations to Soundevent ClipAnnotation
clip_annotation = cr.annotation_to_clip_annotation(annotations)
# Print JSON representation of the ClipAnnotation object
print(
clip_annotation.model_dump_json(
indent=2,
# Avoid printing irrelevant information
exclude_none=True,
exclude_defaults=True,
exclude_unset=True,
)
)
Out:
{
"clip": {
"recording": {
"path": "sample_audio.wav",
"duration": 3.0,
"channels": 1,
"samplerate": 22050,
"hash": "7df7fabc84c9fa3d235db620c38ef288"
},
"start_time": 0.0,
"end_time": 3.0
},
"sound_events": [
{
"sound_event": {
"geometry": {
"coordinates": [
154.387792767,
2878.2,
154.911598217,
4049.0
]
},
"recording": {
"path": "sample_audio.wav",
"duration": 3.0,
"channels": 1,
"samplerate": 22050,
"hash": "7df7fabc84c9fa3d235db620c38ef288"
}
},
"tags": [
{
"term": {
"label": "crowsetta",
"definition": "Unknown",
"name": "soundevent:crowsetta"
},
"value": "EATO"
}
]
},
{
"sound_event": {
"geometry": {
"coordinates": [
167.526598245,
2731.9,
168.17302044,
3902.7
]
},
"recording": {
"path": "sample_audio.wav",
"duration": 3.0,
"channels": 1,
"samplerate": 22050,
"hash": "7df7fabc84c9fa3d235db620c38ef288"
}
},
"tags": [
{
"term": {
"label": "crowsetta",
"definition": "Unknown",
"name": "soundevent:crowsetta"
},
"value": "EATO"
}
]
},
{
"sound_event": {
"geometry": {
"coordinates": [
183.609636834,
2878.2,
184.097751553,
3975.8
]
},
"recording": {
"path": "sample_audio.wav",
"duration": 3.0,
"channels": 1,
"samplerate": 22050,
"hash": "7df7fabc84c9fa3d235db620c38ef288"
}
},
"tags": [
{
"term": {
"label": "crowsetta",
"definition": "Unknown",
"name": "soundevent:crowsetta"
},
"value": "EATO"
}
]
},
{
"sound_event": {
"geometry": {
"coordinates": [
250.527480604,
2756.2,
251.160710509,
3951.4
]
},
"recording": {
"path": "sample_audio.wav",
"duration": 3.0,
"channels": 1,
"samplerate": 22050,
"hash": "7df7fabc84c9fa3d235db620c38ef288"
}
},
"tags": [
{
"term": {
"label": "crowsetta",
"definition": "Unknown",
"name": "soundevent:crowsetta"
},
"value": "EATO"
}
]
},
{
"sound_event": {
"geometry": {
"coordinates": [
277.88724277,
2707.5,
278.480895806,
3975.8
]
},
"recording": {
"path": "sample_audio.wav",
"duration": 3.0,
"channels": 1,
"samplerate": 22050,
"hash": "7df7fabc84c9fa3d235db620c38ef288"
}
},
"tags": [
{
"term": {
"label": "crowsetta",
"definition": "Unknown",
"name": "soundevent:crowsetta"
},
"value": "EATO"
}
]
},
{
"sound_event": {
"geometry": {
"coordinates": [
295.52970757,
2951.4,
296.110168316,
3975.8
]
},
"recording": {
"path": "sample_audio.wav",
"duration": 3.0,
"channels": 1,
"samplerate": 22050,
"hash": "7df7fabc84c9fa3d235db620c38ef288"
}
},
"tags": [
{
"term": {
"label": "crowsetta",
"definition": "Unknown",
"name": "soundevent:crowsetta"
},
"value": "EATO"
}
]
}
]
}
And that's it! We have successfully loaded annotations using crowsetta
and
converted them to soundevent
format.
Converting back to crowsetta
format#
Now, let's explore the process of converting Soundevent annotations back to Crowsetta objects.
from soundevent.data import (
BoundingBox,
Clip,
ClipAnnotation,
Recording,
SoundEvent,
SoundEventAnnotation,
Tag,
)
# First, let's create some annotations for the example audio file
recording = Recording.from_file("sample_audio.wav")
clip_annotation = ClipAnnotation(
clip=Clip(
recording=recording,
start_time=0,
end_time=1,
),
sound_events=[
SoundEventAnnotation(
tags=[
Tag(key="species", value="bird"),
Tag(key="color", value="red"),
],
sound_event=SoundEvent(
recording=recording,
geometry=BoundingBox(coordinates=[0.1, 2000, 0.2, 3000]),
),
),
SoundEventAnnotation(
tags=[Tag(key="species", value="frog")],
sound_event=SoundEvent(
recording=recording,
geometry=BoundingBox(coordinates=[0.3, 1000, 0.6, 1500]),
),
),
],
)
Out:
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/pydantic/main.py:212: DeprecationWarning: The 'key' field is deprecated. Please use 'term' instead.
validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
Now, let's convert the ClipAnnotation object to Crowsetta format
annotations = cr.annotation_from_clip_annotation(
clip_annotation,
"random_file_path.txt",
annotation_fmt="bbox",
)
print(annotations)
Out:
Annotation(annot_path=PosixPath('random_file_path.txt'), notated_path=PosixPath('sample_audio.wav'), bboxes=[BBox(onset=0.1, offset=0.2, low_freq=2000.0, high_freq=3000.0, label='species:bird,color:red'), BBox(onset=0.3, offset=0.6, low_freq=1000.0, high_freq=1500.0, label='species:frog')])
Note
While working with crowsetta
, annotation objects are typically loaded
from a file. In this demonstration, we're using a random file name to
instantiate the annotations, even though the file doesn't exist. It's
important to note that crowsetta
requires a file path to create
annotations, even if they are not actually written to the file.
Therefore, using a random filepath is a safe practice.
Finer Control#
When converting between crowsetta and soundevent formats, you have a multitude of options at your disposal. Soundevent objects can contain a wealth of information beyond what crowsetta objects offer, including multiple tags, notes, various geometry types, and more. Consequently, the conversion process isn't always straightforward. Particularly when converting from soundevent to crowsetta format, you'll need to make decisions regarding how to handle the additional information.
Tags and Labels#
One of the primary distinctions between crowsetta and soundevent lies in their handling of labels/tags. While crowsetta employs a single textual label for each annotation, soundevent utilizes a list of key-value tags. This difference complicates the conversion process.
By default, when converting to crowsetta format, the label field of the
crowsetta annotation gets converted to a single tag with the key crowsetta
and the value of the label field. However, numerous customization options are
available to tailor this behavior. Refer to the
documentation for more information.
Out:
In the reverse direction, the default behavior amalgamates all tags into a single label. For example:
tags = [
Tag(key="species", value="bird"),
Tag(key="color", value="red"),
]
label = cr.label_from_tags(tags)
print(label)
Out:
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/pydantic/main.py:212: DeprecationWarning: The 'key' field is deprecated. Please use 'term' instead.
validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
species:bird,color:red
Once again, you have the option to customize this behavior. Refer to the documentation for more information.
Total running time of the script: ( 0 minutes 1.088 seconds) Estimated memory usage: 13 MB
Download Python source code: 5_integration_with_crowsetta.py
Download Jupyter notebook: 5_integration_with_crowsetta.ipynb