Skip to content

Transforms Module#

soundevent.transforms #

Data transformations for soundevent objects.

This module provides a framework for applying transformations to soundevent data objects. The core of the framework is the TransformBase class, which defines a visitor pattern for traversing the complex hierarchy of soundevent data models.

The module also includes concrete transform classes for common data manipulation tasks, such as modifying recording paths (PathTransform) or transforming tags (TagsTransform).

These tools are designed to help users clean, modify, and standardize their bioacoustic datasets in a structured and reliable way.

Modules:

Name Description
base

Base classes for data transformations.

path

Transformations for recording paths.

tags

Transformations for tags.

Classes:

Name Description
PathTransform

A transform for modifying the path of recordings.

TagsTransform

A transform for modifying sequences of tags.

TransformBase

Base class for creating data transformations.

Classes#

PathTransform(transform) #

Bases: TransformBase

A transform for modifying the path of recordings.

This class provides a convenient way to apply a path transformation to all Recording objects within a larger data structure (like a Dataset or AnnotationProject). It works by overriding the transform_path method of the TransformBase.

Parameters:

Name Type Description Default
transform Callable[[Path], Path]

A function that takes a pathlib.Path object and returns a transformed pathlib.Path object.

required

Examples:

>>> from pathlib import Path
>>> from soundevent import data
>>> from soundevent.transforms import PathTransform
>>>
>>> # Create a sample dataset to work with
>>> recording = data.Recording(
...     path=Path("../relative/path/rec.wav"),
...     duration=1,
...     channels=1,
...     samplerate=16000,
... )
>>> dataset = data.Dataset(name="test-dataset", recordings=[recording])
>>>
>>> # Define a function to make all paths absolute
>>> def make_absolute(path: Path) -> Path:
...     # This is a simplistic example, in reality you might need a base directory
...     return path.resolve()
>>>
>>> # Create and apply the transform
>>> path_transformer = PathTransform(transform=make_absolute)
>>> transformed_dataset = path_transformer.transform_dataset(dataset)
>>>
>>> # Check that the path in the transformed dataset is absolute
>>> transformed_dataset.recordings[0].path.is_absolute()
True

Parameters:

Name Type Description Default
transform Callable[[Path], Path]

A function that takes a pathlib.Path object and returns a transformed pathlib.Path object.

required

Methods:

Name Description
transform_path

Apply the transformation to a path.

Attributes:

Name Type Description
transform
Attributes#
transform = transform instance-attribute #
Functions#
transform_path(path) #

Apply the transformation to a path.

Parameters:

Name Type Description Default
path Path

The path to transform.

required

Returns:

Type Description
Path

The transformed path.

TagsTransform(transform) #

Bases: TransformBase

A transform for modifying sequences of tags.

This class provides a way to apply a transformation to all Tag sequences within a soundevent data structure. It is useful for filtering, renaming, or otherwise modifying tags across an entire dataset.

It can be initialized directly with a function that transforms a whole sequence of tags, or it can be constructed from a function that transforms a single tag using the from_tag_transform class method.

Parameters:

Name Type Description Default
transform Callable[[Sequence[Tag]], Sequence[Tag]]

A function that takes a sequence of Tag objects and returns a transformed sequence of Tag objects.

required

Examples:

>>> from pathlib import Path
>>> from soundevent import data
>>> from soundevent.transforms import TagsTransform
>>>
>>> # Create a sample recording with a misspelled species tag
>>> recording = data.Recording(
...     path=Path("rec.wav"),
...     duration=1,
...     channels=1,
...     samplerate=16000,
...     tags=[
...         data.Tag(key="species", value="Myotis mytis"),
...         data.Tag(key="quality", value="good"),
...     ],
... )
>>>
>>> # Create a transform to correct the spelling of "Myotis myotis"
>>> def correct_species_name(tag: data.Tag) -> data.Tag:
...     if tag.key == "species" and tag.value == "Myotis mytis":
...         return tag.model_copy(update={"value": "Myotis myotis"})
...     return tag
>>> corrector = TagsTransform.from_tag_transform(
...     transform=correct_species_name
... )
>>> transformed_recording = corrector.transform_recording(recording)
>>>
>>> # Verify that the tag value has been corrected
>>> species_tag = next(
...     t for t in transformed_recording.tags if t.key == "species"
... )
>>> species_tag.value
'Myotis myotis'
>>>
>>> # Verify that other tags are untouched
>>> quality_tag = next(
...     t for t in transformed_recording.tags if t.key == "quality"
... )
>>> quality_tag.value
'good'

Parameters:

Name Type Description Default
transform Callable[[Sequence[Tag]], Sequence[Tag]]

A function that takes a sequence of Tag objects and returns a transformed sequence of Tag objects.

required

Methods:

Name Description
from_tag_transform

Create a TagsTransform from a function that transforms a single tag.

transform_tags

Apply the transformation to a sequence of tags.

Attributes:

Name Type Description
transform
Attributes#
transform = transform instance-attribute #
Functions#
from_tag_transform(transform) classmethod #

Create a TagsTransform from a function that transforms a single tag.

This factory method is a convenient way to create a TagsTransform when your logic applies to each tag individually.

Parameters:

Name Type Description Default
transform Callable[[Tag], Optional[Tag]]

A function that takes a single Tag object and returns either a transformed Tag object or None. If None is returned, the tag is removed from the sequence.

required

Returns:

Type Description
TagsTransform

A new TagsTransform instance.

transform_tags(tags) #

Apply the transformation to a sequence of tags.

Parameters:

Name Type Description Default
tags Sequence[Tag]

The sequence of tags to transform.

required

Returns:

Type Description
Sequence[Tag]

The transformed sequence of tags.

TransformBase #

Base class for creating data transformations.

This class implements the visitor pattern to traverse the complex hierarchy of soundevent data objects. It provides transform_* methods for each type of data object in the soundevent ecosystem.

The default implementation of each transform_* method returns the object unchanged or, for container-like objects, recursively calls the appropriate transform methods on their children and returns a new container with the transformed children.

To create a custom transformation, inherit from this class and override the transform_* method for the specific object or attribute you want to modify.

Examples:

>>> from soundevent import data
>>> from soundevent.transforms.base import TransformBase
>>>
>>> class UserAnonymizer(TransformBase):
...     def transform_user(self, user: data.User) -> data.User:
...         return user.model_copy(update={"name": "anonymous"})

Methods:

Name Description
transform_annotation_project
transform_annotation_set
transform_annotation_task
transform_clip
transform_clip_annotation
transform_clip_prediction
transform_dataset
transform_evaluation_set
transform_features
transform_geometry
transform_model_run
transform_notes
transform_path
transform_predicted_tags
transform_prediction_set
transform_recording
transform_recording_set
transform_sequence
transform_sequence_annotation
transform_sequence_prediction
transform_sound_event
transform_sound_event_annotation
transform_sound_event_prediction
transform_status_badge
transform_tags
transform_user
Functions#
transform_annotation_project(annotation_project) #
transform_annotation_set(annotation_set) #
transform_annotation_task(annotation_task) #
transform_clip(clip) #
transform_clip_annotation(clip_annotation) #
transform_clip_prediction(clip_prediction) #
transform_dataset(dataset) #
transform_evaluation_set(evaluation_set) #
transform_features(features) #
transform_geometry(geometry) #
transform_model_run(model_run) #
transform_notes(notes) #
transform_path(path) #
transform_predicted_tags(predicted_tags) #
transform_prediction_set(prediction_set) #
transform_recording(recording) #
transform_recording_set(recording_set) #
transform_sequence(sequence) #
transform_sequence_annotation(sequence_annotation) #
transform_sequence_prediction(sequence_prediction) #
transform_sound_event(sound_event) #
transform_sound_event_annotation(sound_event_annotation) #
transform_sound_event_prediction(sound_event_prediction) #
transform_status_badge(status_badge) #
transform_tags(tags) #
transform_user(user) #