Saving and Loading data.#

In soundevent, we use the Acoustic Objects Exchange Format (AOEF) for storing and exchanging audio objects. AOEF is a JSON-based format specifically designed to standardize the representation of computational bioacoustic data, enabling effective sharing and collaboration among researchers.

Why JSON?

JSON or JavaScript Object Notation, is a lightweight data-interchange format that is widely supported across various platforms and programming languages. It provides human-readable syntax and is commonly used in web applications, making it an ideal choice for data exchange.

We use AOEF to share common collections of audio objects, such as datasets, annotation projects, evaluation sets, model runs and performance evaluations.

To demonstrate how to save and load data in AOEF format, we provide examples below:

Datasets#

Suppose we have an example dataset stored in the AOEF format. The dataset is stored as a text file following the JSON structure. To view the contents of the file, you can use the following code.

import json
from pathlib import Path

dataset_path = Path("example_dataset.json")
with open(dataset_path) as file:
    dataset_contents = json.load(file)

print(json.dumps(dataset_contents, indent=2))

Out:

{
  "version": "1.1.0",
  "created_on": "2024-05-14T17:55:44.736382",
  "data": {
    "uuid": "b1096756-eea2-4489-9e6a-b98b559647bb",
    "collection_type": "dataset",
    "created_on": "2023-11-21T13:43:14.742002",
    "recordings": [
      {
        "uuid": "89957d47-f67d-4bfe-8352-bf0fe5a8ce3e",
        "path": "recording1.wav",
        "duration": 10.0,
        "channels": 1,
        "samplerate": 44100,
        "time_expansion": 10.0,
        "hash": "1234567890abcdef",
        "date": "2021-01-01",
        "time": "21:34:56",
        "latitude": 12.345,
        "longitude": 34.567,
        "tags": [
          0,
          1,
          2
        ],
        "features": {
          "SNR": 10.0,
          "ACI": 0.5
        },
        "notes": [
          {
            "uuid": "2931b864-43e4-4fb1-aae1-a214dccca6e3",
            "message": "This is a note.",
            "created_by": "04ef3927-3a3d-40df-9d6e-2cc5e21482a0",
            "is_issue": false,
            "created_on": "2023-11-21T13:43:14.742073"
          }
        ],
        "owners": [
          "d6eb0862-a619-4919-992c-eb3625692c13"
        ]
      },
      {
        "uuid": "bd30f886-3abb-475b-aacb-c7148a4d4420",
        "path": "recording2.wav",
        "duration": 8.0,
        "channels": 1,
        "samplerate": 441000,
        "time_expansion": 10.0,
        "hash": "234567890abcdef1",
        "date": "2021-01-02",
        "time": "19:34:56",
        "latitude": 13.345,
        "longitude": 32.567,
        "tags": [
          3,
          4,
          5
        ],
        "features": {
          "SNR": 7.0,
          "ACI": 0.3
        },
        "notes": [
          {
            "uuid": "713b6c15-0e3d-4cc5-acc6-3f1093209a40",
            "message": "Unsure about the species.",
            "created_by": "04ef3927-3a3d-40df-9d6e-2cc5e21482a0",
            "is_issue": false,
            "created_on": "2023-11-21T13:43:14.742147"
          }
        ],
        "owners": [
          "d6eb0862-a619-4919-992c-eb3625692c13"
        ]
      }
    ],
    "tags": [
      {
        "id": 0,
        "key": "species",
        "value": "Myotis myotis"
      },
      {
        "id": 1,
        "key": "sex",
        "value": "female"
      },
      {
        "id": 2,
        "key": "behaviour",
        "value": "foraging"
      },
      {
        "id": 3,
        "key": "species",
        "value": "Eptesicus serotinus"
      },
      {
        "id": 4,
        "key": "sex",
        "value": "male"
      },
      {
        "id": 5,
        "key": "behaviour",
        "value": "social calls"
      }
    ],
    "users": [
      {
        "uuid": "04ef3927-3a3d-40df-9d6e-2cc5e21482a0",
        "name": "John Doe"
      },
      {
        "uuid": "d6eb0862-a619-4919-992c-eb3625692c13",
        "email": "data.collector@soundevent.org",
        "name": "Data Collector"
      }
    ],
    "name": "test_dataset",
    "description": "A test dataset"
  }
}

Loading Datasets#

By using the loading functions provided by the soundevent package, you can directly load the data into Python and obtain a Dataset object.

from soundevent import io

dataset = io.load(dataset_path)
print(repr(dataset))

Out:

Dataset(uuid=UUID('b1096756-eea2-4489-9e6a-b98b559647bb'), created_on=datetime.datetime(2023, 11, 21, 13, 43, 14, 742002), name='test_dataset', description='A test dataset')

The load function allows you to access and analyze the dataset, which contains recordings and related objects, all structured in a standardized and manageable way.

recording = dataset.recordings[0]
print(f"First recording: {recording!r}")
print(f"Recording tags: {recording.tags}")

Out:

First recording: Recording(path=PosixPath('recording1.wav'))
Recording tags: [Tag(term=Term(label='species'), value='Myotis myotis'), Tag(term=Term(label='sex'), value='female'), Tag(term=Term(label='behaviour'), value='foraging')]

Saving Datasets#

If you have your own dataset, you can save it to a file using the save function. This function stores the dataset in AOEF format, ensuring compatibility and easy sharing with other researchers.

io.save(dataset, dataset_path)

Annotation Projects#

Similar to loading datasets, you can also use the load function to load annotations stored in the AOEF format.

Here we have transformed 10 random annotated recordings from the NIPS4BPlus dataset into the AOEF format and stored it in the nips4b_plus_aoef.json file. You can use the provided code to view the annotations.

annotation_path = Path("nips4b_plus_sample.json")
with open(annotation_path) as file:
    annotation_contents = json.load(file)

print(json.dumps(annotation_contents, indent=2))

Out:

{
  "version": "1.1.0",
  "created_on": "2024-05-14T17:55:45.419210",
  "data": {
    "uuid": "c18624a1-8145-4657-a3f1-b3512134ecf6",
    "collection_type": "annotation_set",
    "recordings": [
      {
        "uuid": "8392b0ff-293f-4d5b-bc1b-d40d2a0eb0dc",
        "path": "train/nips4b_birds_trainfile079.wav",
        "duration": 5.00390022675737,
        "channels": 1,
        "samplerate": 44100,
        "owners": []
      },
      {
        "uuid": "35e120d0-1633-4864-88c9-063aca992747",
        "path": "train/nips4b_birds_trainfile237.wav",
        "duration": 5.00390022675737,
        "channels": 1,
        "samplerate": 44100,
        "owners": []
      },
      {
        "uuid": "7ee23c44-1bc6-4833-8d66-14aa4a8e8634",
        "path": "train/nips4b_birds_trainfile587.wav",
        "duration": 5.00390022675737,
        "channels": 1,
        "samplerate": 44100,
        "owners": []
      },
      {
        "uuid": "24eceb91-535e-42f0-80e4-8c670465bac8",
        "path": "train/nips4b_birds_trainfile106.wav",
        "duration": 4.069297052154195,
        "channels": 1,
        "samplerate": 44100,
        "owners": []
      },
      {
        "uuid": "24dd23da-ca07-40eb-80e6-af8b6b2a75ee",
        "path": "train/nips4b_birds_trainfile430.wav",
        "duration": 5.00390022675737,
        "channels": 1,
        "samplerate": 44100,
        "owners": []
      },
      {
        "uuid": "8954beed-21d0-4f20-98bd-58c14264d853",
        "path": "train/nips4b_birds_trainfile661.wav",
        "duration": 1.3873922902494331,
        "channels": 1,
        "samplerate": 44100,
        "owners": []
      },
      {
        "uuid": "69f7bf42-087a-4d88-b312-f26688597974",
        "path": "train/nips4b_birds_trainfile429.wav",
        "duration": 5.00390022675737,
        "channels": 1,
        "samplerate": 44100,
        "owners": []
      },
      {
        "uuid": "b98b5a59-3518-45e5-9306-67c522540ae1",
        "path": "train/nips4b_birds_trainfile633.wav",
        "duration": 5.00390022675737,
        "channels": 1,
        "samplerate": 44100,
        "owners": []
      },
      {
        "uuid": "52e329a3-afbb-475e-938c-f70c88580723",
        "path": "train/nips4b_birds_trainfile200.wav",
        "duration": 5.00390022675737,
        "channels": 1,
        "samplerate": 44100,
        "owners": []
      },
      {
        "uuid": "6eeca672-f3ec-41ca-a551-54ddec0dd1a7",
        "path": "train/nips4b_birds_trainfile545.wav",
        "duration": 2.7921995464852607,
        "channels": 1,
        "samplerate": 44100,
        "owners": []
      }
    ],
    "clips": [
      {
        "uuid": "b283a71f-4aa8-4ee2-ac83-7c7d5d0af2bc",
        "recording": "8392b0ff-293f-4d5b-bc1b-d40d2a0eb0dc",
        "start_time": 0.0,
        "end_time": 5.00390022675737
      },
      {
        "uuid": "e2ec7571-bfe1-4682-8d4d-15e3e34edfc9",
        "recording": "35e120d0-1633-4864-88c9-063aca992747",
        "start_time": 0.0,
        "end_time": 5.00390022675737
      },
      {
        "uuid": "ab5bcc8f-078e-4194-97a4-763998d289fa",
        "recording": "7ee23c44-1bc6-4833-8d66-14aa4a8e8634",
        "start_time": 0.0,
        "end_time": 5.00390022675737
      },
      {
        "uuid": "dd9ad126-8b15-4121-84f5-f59dc3f85802",
        "recording": "24eceb91-535e-42f0-80e4-8c670465bac8",
        "start_time": 0.0,
        "end_time": 4.069297052154195
      },
      {
        "uuid": "0a53e14f-f51c-4ade-9567-87c2b02bd197",
        "recording": "24dd23da-ca07-40eb-80e6-af8b6b2a75ee",
        "start_time": 0.0,
        "end_time": 5.00390022675737
      },
      {
        "uuid": "ae455d97-d1fa-43a1-b179-907bee8471e6",
        "recording": "8954beed-21d0-4f20-98bd-58c14264d853",
        "start_time": 0.0,
        "end_time": 1.3873922902494331
      },
      {
        "uuid": "0b42c9e1-0289-4f03-b3cb-10859a80103e",
        "recording": "69f7bf42-087a-4d88-b312-f26688597974",
        "start_time": 0.0,
        "end_time": 5.00390022675737
      },
      {
        "uuid": "56329c3c-5beb-48ed-8a79-be4e9bd2d9ed",
        "recording": "b98b5a59-3518-45e5-9306-67c522540ae1",
        "start_time": 0.0,
        "end_time": 5.00390022675737
      },
      {
        "uuid": "57e0b23a-8663-49f3-8455-997e0f4d2b49",
        "recording": "52e329a3-afbb-475e-938c-f70c88580723",
        "start_time": 0.0,
        "end_time": 5.00390022675737
      },
      {
        "uuid": "c7b6f927-fd3b-40ac-9b25-d682eec2d3ac",
        "recording": "6eeca672-f3ec-41ca-a551-54ddec0dd1a7",
        "start_time": 0.0,
        "end_time": 2.7921995464852607
      }
    ],
    "clip_annotations": [
      {
        "uuid": "0e7a786e-48e4-4424-a8da-bf080bdefd9e",
        "clip": "b283a71f-4aa8-4ee2-ac83-7c7d5d0af2bc",
        "created_on": "2023-11-23T20:44:32.913233"
      },
      {
        "uuid": "af9713a7-2bf4-45df-bdff-63d47de34d71",
        "clip": "e2ec7571-bfe1-4682-8d4d-15e3e34edfc9",
        "created_on": "2023-11-23T20:44:32.913255"
      },
      {
        "uuid": "70738dff-3c98-4838-979c-0ed073edd0ea",
        "clip": "ab5bcc8f-078e-4194-97a4-763998d289fa",
        "created_on": "2023-11-23T20:44:32.913264"
      },
      {
        "uuid": "fa0fc36e-0c1a-450f-900b-5a32df17a159",
        "clip": "dd9ad126-8b15-4121-84f5-f59dc3f85802",
        "created_on": "2023-11-23T20:44:32.913377"
      },
      {
        "uuid": "178bf156-2bab-4970-8a76-4abfdc4b31b7",
        "clip": "0a53e14f-f51c-4ade-9567-87c2b02bd197",
        "created_on": "2023-11-23T20:44:32.913386"
      },
      {
        "uuid": "dc71d06a-54d6-4fad-ac6a-205bbee7ec96",
        "clip": "ae455d97-d1fa-43a1-b179-907bee8471e6",
        "created_on": "2023-11-23T20:44:32.913392"
      },
      {
        "uuid": "b418abd6-52cc-4ef4-9725-33d3d38b7878",
        "clip": "0b42c9e1-0289-4f03-b3cb-10859a80103e",
        "created_on": "2023-11-23T20:44:32.913400"
      },
      {
        "uuid": "d18ca56f-2f13-44dd-be43-f995e4d2edb6",
        "clip": "56329c3c-5beb-48ed-8a79-be4e9bd2d9ed",
        "created_on": "2023-11-23T20:44:32.913415"
      },
      {
        "uuid": "8dfa7d26-5f8b-4f37-bf0a-c031373b22f7",
        "clip": "57e0b23a-8663-49f3-8455-997e0f4d2b49",
        "created_on": "2023-11-23T20:44:32.913430"
      },
      {
        "uuid": "ebc5e8d0-bd3e-4d01-95e0-d9a0ce337ac9",
        "clip": "c7b6f927-fd3b-40ac-9b25-d682eec2d3ac",
        "created_on": "2023-11-23T20:44:32.913437"
      }
    ],
    "created_on": "2023-11-23T20:44:32.913488"
  }
}

Loading Annotation Projects#

The load function can be used to load the annotations into Python and obtain an AnnotationProject object directly.

nips4b_sample = io.load(annotation_path, type="annotation_set")
print(repr(nips4b_sample))

Out:

AnnotationSet(created_on=datetime.datetime(2023, 11, 23, 20, 44, 32, 913488), name=None, description=None)

This object allows you to access and analyze the annotations, along with their associated objects.

for clip_annotation in nips4b_sample.clip_annotations:
    clip = clip_annotation.clip
    recording = clip.recording
    print(
        f"* Recording {recording.path} [from "
        f"{clip.start_time:.3f}s to {clip.end_time:.3f}s]"
    )
    print(
        f"\t{len(clip_annotation.sound_events)} sound event annotations found"
    )
    for annotation in clip_annotation.sound_events:
        sound_event = annotation.sound_event
        start_time, end_time = sound_event.geometry.coordinates
        print(f"\t+ Sound event from {start_time:.3f}s to {end_time:.3f}s")
        for tag in annotation.tags:
            print(f"\t\t- {tag}")
    print("")

Out:

* Recording train/nips4b_birds_trainfile079.wav [from 0.000s to 5.004s]
        0 sound event annotations found

* Recording train/nips4b_birds_trainfile237.wav [from 0.000s to 5.004s]
        0 sound event annotations found

* Recording train/nips4b_birds_trainfile587.wav [from 0.000s to 5.004s]
        0 sound event annotations found

* Recording train/nips4b_birds_trainfile106.wav [from 0.000s to 4.069s]
        0 sound event annotations found

* Recording train/nips4b_birds_trainfile430.wav [from 0.000s to 5.004s]
        0 sound event annotations found

* Recording train/nips4b_birds_trainfile661.wav [from 0.000s to 1.387s]
        0 sound event annotations found

* Recording train/nips4b_birds_trainfile429.wav [from 0.000s to 5.004s]
        0 sound event annotations found

* Recording train/nips4b_birds_trainfile633.wav [from 0.000s to 5.004s]
        0 sound event annotations found

* Recording train/nips4b_birds_trainfile200.wav [from 0.000s to 5.004s]
        0 sound event annotations found

* Recording train/nips4b_birds_trainfile545.wav [from 0.000s to 2.792s]
        0 sound event annotations found

Saving Annotation Projects#

Saving the annotation project is just as straightforward using the save function:

io.save(nips4b_sample, "nips4b_plus_sample.json")

Model Runs#

Finally, the outputs of a model run can also be stored in the AOEF format. You can save and load model runs using the save and load functions, respectively. The loading function reads the AOEF file and returns a ModelRun object that can be used for further analysis.

By utilizing the saving and loading functions provided by soundevent, you can easily manage and exchange acoustic data objects in AOEF format, promoting collaboration and advancing your bioacoustic research endeavors.

Total running time of the script: ( 0 minutes 0.722 seconds) Estimated memory usage: 32 MB

Download Python source code: 1_saving_and_loading.py

Download Jupyter notebook: 1_saving_and_loading.ipynb

Gallery generated by mkdocs-gallery