Note
Click here to download the full example code
Loading audio#
One of the fundamental operations in computational bioacoustics is reading
audio files into a workable format. In soundevent
, we use
xarray.DataArray
objects to hold loaded audio data.
xarray.DataArray
objects are an extension of
numpy
arrays, so there's no need to learn new concepts
if you are already familiar with numpy
arrays.
Why use xarray.DataArray
objects?
xarray.DataArray
objects offer two key benefits: coordinates for easier
referencing of time-related locations in the array, and the ability to
store additional metadata such as samplerate
, time_expansion
, and
specify that the temporal units are seconds. To learn more about
xarray.DataArray
objects, see the
xarray documentation.
Supported audio formats
soundevent
supports most of the audio file formats supported by the
soundfile
library. Some
formats were excluded because they do not support seeking and are not
suitable for random access. This still includes most of the common audio
file formats, such as WAV, FLAC, AIFF, and MP3. For a full list of
supported formats, see the
audio.is_audio_file documentation.
Getting a Recording object#
To create a data.Recording
object from an
audio file, you can use the
from_file
method. This method
extracts the metadata from the file and populates the Recording
object with
the relevant information.
from soundevent import data
recording = data.Recording.from_file("sample_audio.wav")
print(repr(recording))
Out:
Loading the audio#
Once you have a data.Recording
object, you can
load the audio data using the
audio.load_recording
function:
Out:
<xarray.DataArray (time: 66150, channel: 1)> Size: 529kB
array([[ 9.15527344e-05],
[ 2.13623047e-04],
[ 0.00000000e+00],
...,
[-3.66210938e-04],
[-1.30310059e-02],
[-6.92749023e-03]])
Coordinates:
* time (time) float64 529kB 0.0 4.535e-05 9.07e-05 ... 3.0 3.0 3.0
* channel (channel) int64 8B 0
Attributes:
recording_id: c20a1a09-5c1e-45ac-b7a9-5e4391e7d23e
path: sample_audio.wav
units: V
standard_name: amplitude
long_name: Amplitude
Note that the returned object is an xarray.DataArray
object with two dimensions: time and channel. The time coordinate represents
the array of times in seconds corresponding to the samples in the
xarray.DataArray object.
Selecting clips from a recording#
You can use the sel
method of xarray.DataArray to
select a clip from the recording. This is useful when you have the full file
loaded into memory and want to extract a specific clip:
# You can select a clip by specifying the start and end times in seconds.
subwav = wav.sel(time=slice(0, 1))
print(repr(subwav))
Out:
<xarray.DataArray (time: 22051, channel: 1)> Size: 176kB
array([[ 9.15527344e-05],
[ 2.13623047e-04],
[ 0.00000000e+00],
...,
[-9.88769531e-03],
[-6.65283203e-03],
[-1.06811523e-03]])
Coordinates:
* time (time) float64 176kB 0.0 4.535e-05 9.07e-05 ... 0.9999 1.0 1.0
* channel (channel) int64 8B 0
Attributes:
recording_id: c20a1a09-5c1e-45ac-b7a9-5e4391e7d23e
path: sample_audio.wav
units: V
standard_name: amplitude
long_name: Amplitude
Alternatively, if you only need to load a clip from the file without loading
the entire file into memory, you can use the
audio.load_clip
function:
clip = data.Clip(
recording=recording,
start_time=0,
end_time=1,
)
subwav2 = audio.load_clip(clip)
print(repr(subwav2))
Out:
<xarray.DataArray (time: 22050, channel: 1)> Size: 176kB
array([[ 9.15527344e-05],
[ 2.13623047e-04],
[ 0.00000000e+00],
...,
[-1.11694336e-02],
[-9.88769531e-03],
[-6.65283203e-03]])
Coordinates:
* time (time) float64 176kB 0.0 4.535e-05 9.07e-05 ... 0.9999 0.9999 1.0
* channel (channel) int64 8B 0
Attributes:
recording_id: c20a1a09-5c1e-45ac-b7a9-5e4391e7d23e
clip_id: 0e43f052-c353-4324-97db-47efe0e9e524
path: sample_audio.wav
units: V
standard_name: amplitude
long_name: Amplitude
In most cases, the results from wav.sel
and audio.load_clip
will be the
same, except for the last sample. However, the difference is negligible, and
the audio.load_clip
function is generally
preferred for efficiency.
You can verify the similarity of the clips:
Out:
Total running time of the script: ( 0 minutes 1.301 seconds) Estimated memory usage: 100 MB
Download Python source code: 2_loading_audio.py