Note
Click here to download the full example code
Saving and Loading data.#
In soundevent
, we use the Acoustic Objects Exchange Format (AOEF) for
storing and exchanging audio objects. AOEF is a JSON-based format
specifically designed to standardize the representation of computational
bioacoustic data, enabling effective sharing and collaboration among
researchers.
Why JSON?
JSON or JavaScript Object Notation, is a lightweight data-interchange format that is widely supported across various platforms and programming languages. It provides human-readable syntax and is commonly used in web applications, making it an ideal choice for data exchange.
We use AOEF to share common collections of audio objects, such as datasets, annotation projects, evaluation sets, model runs and performance evaluations.
To demonstrate how to save and load data in AOEF format, we provide examples below:
Datasets#
Suppose we have an example dataset stored in the AOEF format. The dataset is stored as a text file following the JSON structure. To view the contents of the file, you can use the following code.
import json
from pathlib import Path
dataset_path = Path("example_dataset.json")
with open(dataset_path) as file:
dataset_contents = json.load(file)
print(json.dumps(dataset_contents, indent=2))
Out:
{
"version": "1.1.0",
"created_on": "2024-05-14T17:55:44.736382",
"data": {
"uuid": "b1096756-eea2-4489-9e6a-b98b559647bb",
"collection_type": "dataset",
"created_on": "2023-11-21T13:43:14.742002",
"recordings": [
{
"uuid": "89957d47-f67d-4bfe-8352-bf0fe5a8ce3e",
"path": "recording1.wav",
"duration": 10.0,
"channels": 1,
"samplerate": 44100,
"time_expansion": 10.0,
"hash": "1234567890abcdef",
"date": "2021-01-01",
"time": "21:34:56",
"latitude": 12.345,
"longitude": 34.567,
"tags": [
0,
1,
2
],
"features": {
"SNR": 10.0,
"ACI": 0.5
},
"notes": [
{
"uuid": "2931b864-43e4-4fb1-aae1-a214dccca6e3",
"message": "This is a note.",
"created_by": "04ef3927-3a3d-40df-9d6e-2cc5e21482a0",
"is_issue": false,
"created_on": "2023-11-21T13:43:14.742073"
}
],
"owners": [
"d6eb0862-a619-4919-992c-eb3625692c13"
]
},
{
"uuid": "bd30f886-3abb-475b-aacb-c7148a4d4420",
"path": "recording2.wav",
"duration": 8.0,
"channels": 1,
"samplerate": 441000,
"time_expansion": 10.0,
"hash": "234567890abcdef1",
"date": "2021-01-02",
"time": "19:34:56",
"latitude": 13.345,
"longitude": 32.567,
"tags": [
3,
4,
5
],
"features": {
"SNR": 7.0,
"ACI": 0.3
},
"notes": [
{
"uuid": "713b6c15-0e3d-4cc5-acc6-3f1093209a40",
"message": "Unsure about the species.",
"created_by": "04ef3927-3a3d-40df-9d6e-2cc5e21482a0",
"is_issue": false,
"created_on": "2023-11-21T13:43:14.742147"
}
],
"owners": [
"d6eb0862-a619-4919-992c-eb3625692c13"
]
}
],
"tags": [
{
"id": 0,
"key": "species",
"value": "Myotis myotis"
},
{
"id": 1,
"key": "sex",
"value": "female"
},
{
"id": 2,
"key": "behaviour",
"value": "foraging"
},
{
"id": 3,
"key": "species",
"value": "Eptesicus serotinus"
},
{
"id": 4,
"key": "sex",
"value": "male"
},
{
"id": 5,
"key": "behaviour",
"value": "social calls"
}
],
"users": [
{
"uuid": "04ef3927-3a3d-40df-9d6e-2cc5e21482a0",
"name": "John Doe"
},
{
"uuid": "d6eb0862-a619-4919-992c-eb3625692c13",
"email": "data.collector@soundevent.org",
"name": "Data Collector"
}
],
"name": "test_dataset",
"description": "A test dataset"
}
}
Loading Datasets#
By using the loading functions provided by the soundevent
package, you can
directly load the data into Python and obtain a
Dataset
object.
Out:
Dataset(uuid=UUID('b1096756-eea2-4489-9e6a-b98b559647bb'), created_on=datetime.datetime(2023, 11, 21, 13, 43, 14, 742002), name='test_dataset', description='A test dataset')
The load
function allows you to
access and analyze the dataset, which contains recordings and related
objects, all structured in a standardized and manageable way.
recording = dataset.recordings[0]
print(f"First recording: {recording!r}")
print(f"Recording tags: {recording.tags}")
Out:
First recording: Recording(path=PosixPath('recording1.wav'))
Recording tags: [Tag(term=Term(label='species'), value='Myotis myotis'), Tag(term=Term(label='sex'), value='female'), Tag(term=Term(label='behaviour'), value='foraging')]
Saving Datasets#
If you have your own dataset, you can save it to a file using the
save
function. This function stores
the dataset in AOEF format, ensuring compatibility and easy sharing with
other researchers.
Annotation Projects#
Similar to loading datasets, you can also use the
load
function
to load annotations stored in the AOEF format.
Here we have transformed 10 random annotated recordings from the
NIPS4BPlus dataset into the
AOEF format and stored it in the nips4b_plus_aoef.json
file. You can
use the provided code to view the annotations.
annotation_path = Path("nips4b_plus_sample.json")
with open(annotation_path) as file:
annotation_contents = json.load(file)
print(json.dumps(annotation_contents, indent=2))
Out:
{
"version": "1.1.0",
"created_on": "2024-05-14T17:55:45.419210",
"data": {
"uuid": "c18624a1-8145-4657-a3f1-b3512134ecf6",
"collection_type": "annotation_set",
"recordings": [
{
"uuid": "8392b0ff-293f-4d5b-bc1b-d40d2a0eb0dc",
"path": "train/nips4b_birds_trainfile079.wav",
"duration": 5.00390022675737,
"channels": 1,
"samplerate": 44100,
"owners": []
},
{
"uuid": "35e120d0-1633-4864-88c9-063aca992747",
"path": "train/nips4b_birds_trainfile237.wav",
"duration": 5.00390022675737,
"channels": 1,
"samplerate": 44100,
"owners": []
},
{
"uuid": "7ee23c44-1bc6-4833-8d66-14aa4a8e8634",
"path": "train/nips4b_birds_trainfile587.wav",
"duration": 5.00390022675737,
"channels": 1,
"samplerate": 44100,
"owners": []
},
{
"uuid": "24eceb91-535e-42f0-80e4-8c670465bac8",
"path": "train/nips4b_birds_trainfile106.wav",
"duration": 4.069297052154195,
"channels": 1,
"samplerate": 44100,
"owners": []
},
{
"uuid": "24dd23da-ca07-40eb-80e6-af8b6b2a75ee",
"path": "train/nips4b_birds_trainfile430.wav",
"duration": 5.00390022675737,
"channels": 1,
"samplerate": 44100,
"owners": []
},
{
"uuid": "8954beed-21d0-4f20-98bd-58c14264d853",
"path": "train/nips4b_birds_trainfile661.wav",
"duration": 1.3873922902494331,
"channels": 1,
"samplerate": 44100,
"owners": []
},
{
"uuid": "69f7bf42-087a-4d88-b312-f26688597974",
"path": "train/nips4b_birds_trainfile429.wav",
"duration": 5.00390022675737,
"channels": 1,
"samplerate": 44100,
"owners": []
},
{
"uuid": "b98b5a59-3518-45e5-9306-67c522540ae1",
"path": "train/nips4b_birds_trainfile633.wav",
"duration": 5.00390022675737,
"channels": 1,
"samplerate": 44100,
"owners": []
},
{
"uuid": "52e329a3-afbb-475e-938c-f70c88580723",
"path": "train/nips4b_birds_trainfile200.wav",
"duration": 5.00390022675737,
"channels": 1,
"samplerate": 44100,
"owners": []
},
{
"uuid": "6eeca672-f3ec-41ca-a551-54ddec0dd1a7",
"path": "train/nips4b_birds_trainfile545.wav",
"duration": 2.7921995464852607,
"channels": 1,
"samplerate": 44100,
"owners": []
}
],
"clips": [
{
"uuid": "b283a71f-4aa8-4ee2-ac83-7c7d5d0af2bc",
"recording": "8392b0ff-293f-4d5b-bc1b-d40d2a0eb0dc",
"start_time": 0.0,
"end_time": 5.00390022675737
},
{
"uuid": "e2ec7571-bfe1-4682-8d4d-15e3e34edfc9",
"recording": "35e120d0-1633-4864-88c9-063aca992747",
"start_time": 0.0,
"end_time": 5.00390022675737
},
{
"uuid": "ab5bcc8f-078e-4194-97a4-763998d289fa",
"recording": "7ee23c44-1bc6-4833-8d66-14aa4a8e8634",
"start_time": 0.0,
"end_time": 5.00390022675737
},
{
"uuid": "dd9ad126-8b15-4121-84f5-f59dc3f85802",
"recording": "24eceb91-535e-42f0-80e4-8c670465bac8",
"start_time": 0.0,
"end_time": 4.069297052154195
},
{
"uuid": "0a53e14f-f51c-4ade-9567-87c2b02bd197",
"recording": "24dd23da-ca07-40eb-80e6-af8b6b2a75ee",
"start_time": 0.0,
"end_time": 5.00390022675737
},
{
"uuid": "ae455d97-d1fa-43a1-b179-907bee8471e6",
"recording": "8954beed-21d0-4f20-98bd-58c14264d853",
"start_time": 0.0,
"end_time": 1.3873922902494331
},
{
"uuid": "0b42c9e1-0289-4f03-b3cb-10859a80103e",
"recording": "69f7bf42-087a-4d88-b312-f26688597974",
"start_time": 0.0,
"end_time": 5.00390022675737
},
{
"uuid": "56329c3c-5beb-48ed-8a79-be4e9bd2d9ed",
"recording": "b98b5a59-3518-45e5-9306-67c522540ae1",
"start_time": 0.0,
"end_time": 5.00390022675737
},
{
"uuid": "57e0b23a-8663-49f3-8455-997e0f4d2b49",
"recording": "52e329a3-afbb-475e-938c-f70c88580723",
"start_time": 0.0,
"end_time": 5.00390022675737
},
{
"uuid": "c7b6f927-fd3b-40ac-9b25-d682eec2d3ac",
"recording": "6eeca672-f3ec-41ca-a551-54ddec0dd1a7",
"start_time": 0.0,
"end_time": 2.7921995464852607
}
],
"clip_annotations": [
{
"uuid": "0e7a786e-48e4-4424-a8da-bf080bdefd9e",
"clip": "b283a71f-4aa8-4ee2-ac83-7c7d5d0af2bc",
"created_on": "2023-11-23T20:44:32.913233"
},
{
"uuid": "af9713a7-2bf4-45df-bdff-63d47de34d71",
"clip": "e2ec7571-bfe1-4682-8d4d-15e3e34edfc9",
"created_on": "2023-11-23T20:44:32.913255"
},
{
"uuid": "70738dff-3c98-4838-979c-0ed073edd0ea",
"clip": "ab5bcc8f-078e-4194-97a4-763998d289fa",
"created_on": "2023-11-23T20:44:32.913264"
},
{
"uuid": "fa0fc36e-0c1a-450f-900b-5a32df17a159",
"clip": "dd9ad126-8b15-4121-84f5-f59dc3f85802",
"created_on": "2023-11-23T20:44:32.913377"
},
{
"uuid": "178bf156-2bab-4970-8a76-4abfdc4b31b7",
"clip": "0a53e14f-f51c-4ade-9567-87c2b02bd197",
"created_on": "2023-11-23T20:44:32.913386"
},
{
"uuid": "dc71d06a-54d6-4fad-ac6a-205bbee7ec96",
"clip": "ae455d97-d1fa-43a1-b179-907bee8471e6",
"created_on": "2023-11-23T20:44:32.913392"
},
{
"uuid": "b418abd6-52cc-4ef4-9725-33d3d38b7878",
"clip": "0b42c9e1-0289-4f03-b3cb-10859a80103e",
"created_on": "2023-11-23T20:44:32.913400"
},
{
"uuid": "d18ca56f-2f13-44dd-be43-f995e4d2edb6",
"clip": "56329c3c-5beb-48ed-8a79-be4e9bd2d9ed",
"created_on": "2023-11-23T20:44:32.913415"
},
{
"uuid": "8dfa7d26-5f8b-4f37-bf0a-c031373b22f7",
"clip": "57e0b23a-8663-49f3-8455-997e0f4d2b49",
"created_on": "2023-11-23T20:44:32.913430"
},
{
"uuid": "ebc5e8d0-bd3e-4d01-95e0-d9a0ce337ac9",
"clip": "c7b6f927-fd3b-40ac-9b25-d682eec2d3ac",
"created_on": "2023-11-23T20:44:32.913437"
}
],
"created_on": "2023-11-23T20:44:32.913488"
}
}
Loading Annotation Projects#
The load
function can be used to load the annotations into Python and obtain an
AnnotationProject
object directly.
Out:
This object allows you to access and analyze the annotations, along with their associated objects.
for clip_annotation in nips4b_sample.clip_annotations:
clip = clip_annotation.clip
recording = clip.recording
print(
f"* Recording {recording.path} [from "
f"{clip.start_time:.3f}s to {clip.end_time:.3f}s]"
)
print(
f"\t{len(clip_annotation.sound_events)} sound event annotations found"
)
for annotation in clip_annotation.sound_events:
sound_event = annotation.sound_event
start_time, end_time = sound_event.geometry.coordinates
print(f"\t+ Sound event from {start_time:.3f}s to {end_time:.3f}s")
for tag in annotation.tags:
print(f"\t\t- {tag}")
print("")
Out:
* Recording train/nips4b_birds_trainfile079.wav [from 0.000s to 5.004s]
0 sound event annotations found
* Recording train/nips4b_birds_trainfile237.wav [from 0.000s to 5.004s]
0 sound event annotations found
* Recording train/nips4b_birds_trainfile587.wav [from 0.000s to 5.004s]
0 sound event annotations found
* Recording train/nips4b_birds_trainfile106.wav [from 0.000s to 4.069s]
0 sound event annotations found
* Recording train/nips4b_birds_trainfile430.wav [from 0.000s to 5.004s]
0 sound event annotations found
* Recording train/nips4b_birds_trainfile661.wav [from 0.000s to 1.387s]
0 sound event annotations found
* Recording train/nips4b_birds_trainfile429.wav [from 0.000s to 5.004s]
0 sound event annotations found
* Recording train/nips4b_birds_trainfile633.wav [from 0.000s to 5.004s]
0 sound event annotations found
* Recording train/nips4b_birds_trainfile200.wav [from 0.000s to 5.004s]
0 sound event annotations found
* Recording train/nips4b_birds_trainfile545.wav [from 0.000s to 2.792s]
0 sound event annotations found
Saving Annotation Projects#
Saving the annotation project is just as straightforward using the
save
function:
Model Runs#
Finally, the outputs of a model run can also be stored in the AOEF
format. You can save and load model runs using the
save
and
load
functions, respectively. The
loading function reads the AOEF file and returns a
ModelRun
object that can be
used for further analysis.
By utilizing the saving and loading functions provided by soundevent, you can easily manage and exchange acoustic data objects in AOEF format, promoting collaboration and advancing your bioacoustic research endeavors.
Total running time of the script: ( 0 minutes 0.683 seconds) Estimated memory usage: 11 MB
Download Python source code: 1_saving_and_loading.py