Loading audio#

One of the fundamental operations in computational bioacoustics is reading audio files into a workable format. In soundevent, we use xarray.DataArray objects to hold loaded audio data. xarray.DataArray objects are an extension of numpy arrays, so there's no need to learn new concepts if you are already familiar with numpy arrays.

Why use xarray.DataArray objects?

xarray.DataArray objects offer two key benefits: coordinates for easier referencing of time-related locations in the array, and the ability to store additional metadata such as samplerate, time_expansion, and specify that the temporal units are seconds. To learn more about xarray.DataArray objects, see the xarray documentation.

Supported audio formats

soundevent supports most of the audio file formats supported by the soundfile library. Some formats were excluded because they do not support seeking and are not suitable for random access. This still includes most of the common audio file formats, such as WAV, FLAC, AIFF, and MP3. For a full list of supported formats, see the audio.is_audio_file documentation.

Getting a Recording object#

To create a data.Recording object from an audio file, you can use the from_file method. This method extracts the metadata from the file and populates the Recording object with the relevant information.

from soundevent import data

recording = data.Recording.from_file("sample_audio.wav")
print(repr(recording))

Out:

Recording(path=PosixPath('sample_audio.wav'))

Loading the audio#

Once you have a data.Recording object, you can load the audio data using the audio.load_recording function:

from soundevent import audio

wav = audio.load_recording(recording)
print(wav)

Out:

<xarray.DataArray (time: 66150, channel: 1)> Size: 529kB
array([[ 9.15527344e-05],
       [ 2.13623047e-04],
       [ 0.00000000e+00],
       ...,
       [-3.66210938e-04],
       [-1.30310059e-02],
       [-6.92749023e-03]])
Coordinates:
  * time     (time) float64 529kB 0.0 4.535e-05 9.07e-05 ... 3.0 3.0 3.0
  * channel  (channel) int64 8B 0
Attributes:
    recording_id:   cf47aa83-8f14-4203-a94b-fad35d49e721
    path:           sample_audio.wav
    units:          V
    standard_name:  amplitude
    long_name:      Amplitude

Note that the returned object is an xarray.DataArray object with two dimensions: time and channel. The time coordinate represents the array of times in seconds corresponding to the samples in the xarray.DataArray object.

Selecting clips from a recording#

You can use the sel method of xarray.DataArray to select a clip from the recording. This is useful when you have the full file loaded into memory and want to extract a specific clip:

# You can select a clip by specifying the start and end times in seconds.
subwav = wav.sel(time=slice(0, 1))
print(repr(subwav))

Out:

<xarray.DataArray (time: 22051, channel: 1)> Size: 176kB
array([[ 9.15527344e-05],
       [ 2.13623047e-04],
       [ 0.00000000e+00],
       ...,
       [-9.88769531e-03],
       [-6.65283203e-03],
       [-1.06811523e-03]])
Coordinates:
  * time     (time) float64 176kB 0.0 4.535e-05 9.07e-05 ... 0.9999 1.0 1.0
  * channel  (channel) int64 8B 0
Attributes:
    recording_id:   cf47aa83-8f14-4203-a94b-fad35d49e721
    path:           sample_audio.wav
    units:          V
    standard_name:  amplitude
    long_name:      Amplitude

Alternatively, if you only need to load a clip from the file without loading the entire file into memory, you can use the audio.load_clip function:

clip = data.Clip(
    recording=recording,
    start_time=0,
    end_time=1,
)
subwav2 = audio.load_clip(clip)
print(repr(subwav2))

Out:

<xarray.DataArray (time: 22050, channel: 1)> Size: 176kB
array([[ 9.15527344e-05],
       [ 2.13623047e-04],
       [ 0.00000000e+00],
       ...,
       [-1.11694336e-02],
       [-9.88769531e-03],
       [-6.65283203e-03]])
Coordinates:
  * time     (time) float64 176kB 0.0 4.535e-05 9.07e-05 ... 0.9999 0.9999 1.0
  * channel  (channel) int64 8B 0
Attributes:
    recording_id:   cf47aa83-8f14-4203-a94b-fad35d49e721
    clip_id:        35ec1d89-2181-4a9c-80ff-85d3a8d195d0
    path:           sample_audio.wav
    units:          V
    standard_name:  amplitude
    long_name:      Amplitude

In most cases, the results from wav.sel and audio.load_clip will be the same, except for the last sample. However, the difference is negligible, and the audio.load_clip function is generally preferred for efficiency.

You can verify the similarity of the clips:

import numpy as np

print(np.allclose(subwav[:-1], subwav2))

Out:

True

Total running time of the script: ( 0 minutes 2.560 seconds) Estimated memory usage: 89 MB

Download Python source code: 2_loading_audio.py

Download Jupyter notebook: 2_loading_audio.ipynb

Gallery generated by mkdocs-gallery