Skip to content

Terms Module#

soundevent.terms #

Terms module.

This module provides tools for creating and managing standardized terms.

In the soundevent ecosystem, metadata is stored in Tag objects, which are pairs of a Term and a value. The Term provides a standardized definition that gives context and meaning to the value. For example, the Term scientific_name gives meaning to the value "Turdus migratorius". Using standardized terms makes data understandable, shareable, and interoperable.

This module provides three main features:

  1. Pre-defined Terms: A collection of standard terms for common concepts in bioacoustics (e.g., scientific_name, f1_score).
  2. Global API: A set of functions (find_term, add_term, add_term, etc. ) for managing terms in a global registry. add_terms_from_file is a convenience function that loads terms from a file and then registers them.
  3. TermRegistry Class: The underlying class for creating and managing custom term collections.

Examples:

>>> from soundevent.data import Tag, Term
>>> from soundevent.terms import scientific_name, add_term, find_term
>>>
>>> # A list of tags that might be attached to a sound event
>>> tags = []
>>>
>>> # Use a pre-defined term to create a Tag
>>> species_tag = Tag(term=scientific_name, value="Turdus migratorius")
>>> tags.append(species_tag)
>>>
>>> print(f"{tags[0].term.label}: {tags[0].value}")
Scientific Taxon Name: Turdus migratorius
>>>
>>> # Create and use a custom term for a new Tag
>>> add_term(
...     Term(
...         name="custom:quality",
...         label="Quality",
...         definition="The quality of the recording, from 1 (poor) to 5 (excellent).",
...     )
... )
>>> quality_term = find_term(q="quality")[0]
>>> quality_tag = Tag(term=quality_term, value="4")
>>> tags.append(quality_tag)
>>>
>>> print(f"{tags[1].term.label}: {tags[1].value}")
Quality: 4

Classes:

Name Description
TermRegistry

A mutable mapping for managing, storing, and retrieving Term objects.

TermSet

A collection of terms and their optional mappings.

Functions:

Name Description
add_term

Register a term.

find_term

Find terms by substring match.

get_term

Retrieve a term by its key.

get_term_by

Retrieve one term by an exact match on a single attribute.

has_term

Check if a term exists in the registry.

remove_term

Remove a term from the registry by its key.

add_terms_from_file

Load terms from a file and add them to a registry.

get_global_term_registry

Return the current global term registry.

set_global_term_registry

Set a new global term registry.

Classes#

TermRegistry(terms=None) #

Bases: MutableMapping[str, Term]

A mutable mapping for managing, storing, and retrieving Term objects.

Provides dictionary-like access (getting, setting, deleting by key) along with specialized methods for finding terms based on their attributes. It serves as a central point to manage and access standardized Term objects within a project.

Attributes:

Name Type Description
_terms Dict[str, Term]

The internal dictionary holding the registered terms.

Parameters:

Name Type Description Default
terms Optional[Dict[str, Term]]

A dictionary of initial terms {key: term} to populate the registry. Defaults to an empty registry.

None

Methods:

Name Description
add_term

Register a term, optionally defaulting the key to term.name.

find

Find terms by substring match; returns multiple terms.

get

Retrieve a term by key, returning a default if not found.

get_by

Retrieve one term by an exact match on a single attribute.

remove

Remove a term from the registry by its key.

Functions#
add_term(term, key=None, force=False) #

Register a term, optionally defaulting the key to term.name.

Parameters:

Name Type Description Default
term Term

The Term object to add.

required
key Optional[str]

The key to use for registration. If None, term.name is used. Defaults to None.

None
force bool

If True, allows overriding an existing term with the same key. If False, raises TermOverrideError if the key already exists.

False

Raises:

Type Description
TermOverrideError

If force is False and a term with the same key already exists.

find(label=None, name=None, uri=None, definition=None, q=None, ignore_case=True) #

Find terms by substring match; returns multiple terms.

If q is provided, it searches label, name, uri, and definition for a match (OR logic). If q is not provided, it searches using the specific fields, requiring all provided fields to match (AND logic). If no arguments are given, all terms are returned.

Parameters:

Name Type Description Default
label Optional[str]

Substring to search for in labels.

None
name Optional[str]

Substring to search for in names.

None
uri Optional[str]

Substring to search for in URIs.

None
definition Optional[str]

Substring to search for in definitions.

None
q Optional[str]

General query string (searches all fields, OR logic).

None
ignore_case bool

Perform case-insensitive search. Defaults to True.

True

Returns:

Type Description
list[Term]

A list of matching Term objects.

Raises:

Type Description
ValueError

If q is used with other specific criteria.

get(key, default=None) #

Retrieve a term by key, returning a default if not found.

Mimics dict.get(). Returns None by default if the key is not found, or a specified default value.

Parameters:

Name Type Description Default
key str

The key of the term to retrieve.

required
default Any

The value to return if the key is not found. Defaults to None.

None

Returns:

Type Description
Optional[Term]

The Term object or the default value.

Raises:

Type Description
ValueError

If default is provided and is not None or a Term.

get_by(label=None, name=None, uri=None) #

Retrieve one term by an exact match on a single attribute.

Requires exactly one search criterion and expects exactly one match.

Parameters:

Name Type Description Default
label Optional[str]

The exact label to match.

None
name Optional[str]

The exact name to match.

None
uri Optional[str]

The exact URI to match.

None

Returns:

Type Description
Term

The single Term object that matches.

Raises:

Type Description
ValueError

If zero or more than one criterion is provided.

TermNotFoundError

If no term matches the criterion.

MultipleTermsFoundError

If more than one term matches the criterion.

remove(key) #

Remove a term from the registry by its key.

Parameters:

Name Type Description Default
key str

The key of the term to remove.

required

Raises:

Type Description
KeyError

If no term is found with the given key.

TermSet #

Bases: BaseModel

A collection of terms and their optional mappings.

Attributes:

Name Type Description
aliases Dict[str, str]

A mapping from a custom key to a term name.

terms List[Term]

A list of term objects.

Attributes#
aliases = Field(default_factory=dict) class-attribute instance-attribute #

A mapping from a custom key to a term name.

terms instance-attribute #

A list of term objects.

Functions#

add_term(term, key=None, term_registry=None, force=False) #

Register a term.

By default, the key is derived from term.name.

Parameters:

Name Type Description Default
term Term

The Term object to add.

required
key Optional[str]

The key to use for registration. If None, term.name is used.

None
term_registry Optional[TermRegistry]

If provided, the term is added to this registry instead of the global one.

None
force bool

If True, overwrite any existing term with the same key.

False

Raises:

Type Description
KeyError

If force is False and the key already exists.

find_term(label=None, name=None, uri=None, definition=None, q=None, ignore_case=True, term_registry=None) #

Find terms by substring match.

If q is provided, it searches label, name, uri, and definition for a match (OR logic). If q is not provided, it searches using the specific fields, requiring all provided fields to match (AND logic). If no arguments are given, all terms are returned.

Parameters:

Name Type Description Default
label Optional[str]

Substring to search for in labels.

None
name Optional[str]

Substring to search for in names.

None
uri Optional[str]

Substring to search for in URIs.

None
definition Optional[str]

Substring to search for in definitions.

None
q Optional[str]

General query string (searches all fields, OR logic).

None
ignore_case bool

Perform case-insensitive search.

True
term_registry Optional[TermRegistry]

If provided, the search is performed on this registry instead of the global one.

None

Returns:

Type Description
List[Term]

A list of matching Term objects.

get_term(key, default=None, term_registry=None) #

Retrieve a term by its key.

Parameters:

Name Type Description Default
key str

The key of the term to retrieve.

required
default Optional[Term]

The value to return if the key is not found.

None
term_registry Optional[TermRegistry]

If provided, the term is retrieved from this registry instead of the global one.

None

Returns:

Type Description
Optional[Term]

The Term object or the default value if not found.

get_term_by(label=None, name=None, uri=None, term_registry=None) #

Retrieve one term by an exact match on a single attribute.

Requires exactly one search criterion and expects exactly one match.

Parameters:

Name Type Description Default
label Optional[str]

The exact label to match.

None
name Optional[str]

The exact name to match.

None
uri Optional[str]

The exact URI to match.

None
term_registry Optional[TermRegistry]

If provided, the search is performed on this registry instead of the global one.

None

Returns:

Type Description
Term

The single Term object that matches.

Raises:

Type Description
ValueError

If zero or more than one criterion is provided.

TermNotFoundError

If no term matches the criterion.

MultipleTermsFoundError

If more than one term matches the criterion.

has_term(key, term_registry=None) #

Check if a term exists in the registry.

Parameters:

Name Type Description Default
key str

The key of the term to check.

required
term_registry Optional[TermRegistry]

If provided, the check is performed on this registry instead of the global one.

None

Returns:

Type Description
bool

True if the term exists, False otherwise.

remove_term(key, term_registry=None) #

Remove a term from the registry by its key.

Parameters:

Name Type Description Default
key str

The key of the term to remove.

required
term_registry Optional[TermRegistry]

If provided, the term is removed from this registry instead of the global one.

None

Raises:

Type Description
KeyError

If no term is found with the given key.

add_terms_from_file(path, term_registry=None, format=None, override_existing=False, ignore_overrides=True, ignore_missing_key=True) #

Load terms from a file and add them to a registry.

This function provides options to handle cases where a term being loaded already exists in the registry, or when a mapping refers to a non-existent term.

The format can be specified explicitly. If not, it will be inferred from the file extension.

Parameters:

Name Type Description Default
path PathLike

The path to the file.

required
term_registry Optional[TermRegistry]

The registry to add the terms to. If None, the global registry is used.

None
format Optional[TermFormat]

The format of the file. If None, it will be inferred from the file extension.

None
override_existing bool

If True, existing terms with the same name will be overwritten. Defaults to False.

False
ignore_overrides bool

If True, and override_existing is False, any term that already exists in the registry will be skipped without raising an error. If False, a TermOverrideError will be raised. Defaults to True.

True
ignore_missing_key bool

If True, any alias in the mapping that refers to a non-existent term will be skipped. If False, a TermNotFoundError will be raised. Defaults to True.

True

Raises:

Type Description
TermOverrideError

If override_existing is False, ignore_overrides is False, and a term or alias being loaded already exists in the registry.

TermNotFoundError

If ignore_missing_key is False and an alias in the mapping refers to a term name that is not found in the registry.

Notes

See [soundevent.terms.io][] for detailed information on the supported JSON and CSV file structures.

get_global_term_registry() #

Return the current global term registry.

Returns:

Type Description
TermRegistry

The active global TermRegistry instance.

set_global_term_registry(term_registry) #

Set a new global term registry.

This function replaces the existing global registry with a new one. All subsequent calls to functions in this module will operate on the new registry.

Parameters:

Name Type Description Default
term_registry TermRegistry

The new TermRegistry instance to set as the global registry.

required

Term Library#

soundevent.terms.library #

Modules:

Name Description
devices
geography
metrics
roi
taxonomy

Attributes:

Name Type Description
accuracy
alternative
average_precision
balanced_accuracy
bandwidth
capture_device
common_name
country
county
duration
elevation
f1_score
family
genus
high_freq
jaccard_index
location_id
low_freq
mean_average_precision
num_segments
order
scientific_name
state_province
taxonomic_class
top_3_accuracy
true_class_probability

Attributes#

accuracy = Term(uri='http://purl.obolibrary.org/obo/STATO_0000415', name='stato:accuracy', label='Accuracy', definition='In the context of binary classification, accuracy is defined as the proportion of true results (both true positives and true negatives) to the total number of cases examined (the sum of true positive, true negative, false positive and false negative). It can be understood as a measure of the proximity of measurement results to the true value. Accuracy is a metric used in the context of classification tasks to evaluate the proportion of correctly predicted instances among the total instances. Key Points: Use Case: Classification performance evaluation. Metric: Measures the proportion of correct predictions. Interpretation: Higher values indicate better classification performance.') module-attribute #

alternative = Term(uri='http://purl.org/dc/terms/alternative', name='dcterms:alternative', label='Alternative', definition='An alternative name for the resource.', scope_note='Can be used to reference an identifier from an external source for a resource within a new collection, acting as a cross-reference.') module-attribute #

average_precision = Term(name='soundevent_metrics:averagePrecision', label='Average Precision', definition='The average precision (AP) is a metric that quantifies the quality of a binary detection task. The AP is defined as the area under the precision-recall curve.') module-attribute #

balanced_accuracy = Term(name='soundevent_metrics:balancedAccuracy', label='Balanced Accuracy', definition='The macro-average of recall scores per class or, equivalently, raw accuracy where each sample is weighted according to the inverse prevalence of its true class. Thus for balanced datasets, the score is equal to accuracy.') module-attribute #

bandwidth = Term(name='soundevent:bandwidth', label='Bandwidth', definition='The difference between the highest and lowest frequency of the sound event.', scope_note='Numeric value in hertz (Hz)') module-attribute #

capture_device = Term(uri='http://rs.tdwg.org/ac/terms/captureDevice', name='ac:captureDevice', label='Capture Device', definition='Free form text describing the device or devices used to create the resource.', scope_note='It is best practice to record the device; this may include a combination such as camera plus lens, or camera plus microscope. Examples: "Canon Supershot 2000", "Makroscan Scanner 2000", "Zeiss Axioscope with Camera IIIu", "SEM (Scanning Electron Microscope)".') module-attribute #

common_name = Term(uri='http://rs.tdwg.org/dwc/terms/vernacularName', name='dwc:vernacularName', label='Common Name', definition='A common or vernacular name.', scope_note='Common (= vernacular) names of the subject in one or several languages. The ISO 639-1 language code SHOULD be given in parentheses after the name if not all names are given by values of the Metadata Language term.', description="The ISO language code after the name should be formatted as in the following example: 'abete bianco (it); Tanne (de); White Fir (en)'. If names are known to be male- or female-specific, this may be specified as in: 'ewe (en-female); ram (en-male);'.") module-attribute #

country = Term(uri='http://rs.tdwg.org/dwc/terms/country', name='dwc:country', label='Country', definition='The name of the country or major administrative unit in which the Location occurs.', scope_note='Recommended best practice is to use a controlled vocabulary such as the Getty Thesaurus of Geographic Names.') module-attribute #

county = Term(uri='http://rs.tdwg.org/dwc/terms/county', name='dwc:county', label='Second Order Division', definition='The full, unabbreviated name of the next smaller administrative region than stateProvince (county, shire, department, etc.) in which the dcterms:Location occurs.', scope_note='Recommended best practice is to use a controlled vocabulary such as the Getty Thesaurus of Geographic Names. Recommended best practice is to leave this field blank if the dcterms:Location spans multiple entities at this administrative level or if the dcterms:Location might be in one or another of multiple possible entities at this level. Multiplicity and uncertainty of the geographic entity can be captured either in the term dwc:higherGeography or in the term dwc:locality, or both.') module-attribute #

duration = Term(uri='http://rs.tdwg.org/ac/terms/mediaDuration', label='Media Duration', name='ac:mediaDuration', definition='The playback duration of an audio or video file in seconds.') module-attribute #

elevation = Term(uri='http://rs.tdwg.org/dwc/terms/verbatimElevation', name='dwc:verbatimElevation', label='Verbatim Elevation', definition='The original description of the elevation (altitude, usually above sea level) of the Location.') module-attribute #

f1_score = Term(name='soundevent_metrics:f1_score', label='F1 Score', definition="The F1 score is the harmonic mean of precision and recall. It is a measure of a test's accuracy that considers both the precision and recall of the test to compute the score. The F1 score is the weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst at 0.") module-attribute #

family = Term(uri='http://rs.tdwg.org/dwc/terms/family', name='dwc:family', label='Family', definition='The full scientific name of the family in which the dwc:Taxon is classified.') module-attribute #

genus = Term(uri='http://rs.tdwg.org/dwc/terms/genus', name='dwc:genus', label='Genus', definition='The full scientific name of the genus in which the dwc:Taxon is classified.') module-attribute #

high_freq = Term(uri='http://rs.tdwg.org/ac/terms/freqHigh', label='Upper frequency bound', name='ac:freqHigh', definition='The highest frequency of the phenomena reflected in the multimedia item or Region of Interest.', scope_note='Numeric value in hertz (Hz)', description='This term refers to the sound events depicted and not to the constraints of the recording medium, so are in principle independent from sampleRate. If dwc:scientificName is specified and if applied to the entire multimedia item, these frequency bounds refer to the sounds of the species given in the dwc:scientificName throughout the whole recording. Although many users will specify both freqLow and freqHigh, it is permitted to specify just one or the other, for example if only one of the bounds is discernible.') module-attribute #

jaccard_index = Term(name='soundevent_metrics:jaccard', label='Jaccard Index', definition='The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for comparing the similarity and diversity of sample sets. The Jaccard index is defined as the size of the intersection divided by the size of the union of two sample sets.') module-attribute #

location_id = Term(uri='http://rs.tdwg.org/dwc/terms/locationID', name='dwc:locationID', label='Location ID', definition='An identifier for the set of location information (data associated with dcterms:Location). May be a global unique identifier or an identifier specific to the data set.') module-attribute #

low_freq = Term(uri='http://rs.tdwg.org/ac/terms/freqLow', label='Lower frequency bound', name='ac:freqLow', definition='The lowest frequency of the phenomena reflected in the multimedia item or Region of Interest.', scope_note='Numeric value in hertz (Hz)', description='This term refers to the sound events depicted and not to the constraints of the recording medium, so are in principle independent from sampleRate. If dwc:scientificName is specified and if applied to the entire multimedia item, these frequency bounds refer to the sounds of the species given in the dwc:scientificName throughout the whole recording. Although many users will specify both freqLow and freqHigh, it is permitted to specify just one or the other, for example if only one of the bounds is discernible.') module-attribute #

mean_average_precision = Term(name='soundevent_metrics:meanAveragePrecision', label='Mean Average Precision', definition='The mean of the average precision scores per class.', description='The average precision (AP) is a metric that quantifies the quality of a binary detection task. The AP is defined as the area under the precision-recall curve. The mean average precision (mAP) is the mean of the average precision scores per class.') module-attribute #

num_segments = Term(name='soundevent:numSegments', label='Number of Segments', definition='Number of segments that compose the ROI of a sound event.') module-attribute #

order = Term(uri='http://rs.tdwg.org/dwc/terms/order', name='dwc:order', label='Order', definition='The full scientific name of the order in which the dwc:Taxon is classified.') module-attribute #

scientific_name = Term(uri='http://rs.tdwg.org/dwc/terms/scientificName', name='dwc:scientificName', label='Scientific Taxon Name', definition='The full scientific name, with authorship and date information if known. When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term.', scope_note='Scientific names of taxa represented in the media resource (with date and name authorship information if available) of the lowest level taxonomic rank that can be applied.', description='The Scientific Taxon Name may possibly be of a higher rank, e.g., a genus or family name, if this is the most specific identification available. Where multiple taxa are the subject of the media item, multiple names may be given. If possible, add this information here even if the title or caption of the resource already contains scientific taxon names. Where the list of scientific taxon names is impractically large (e.g., media collections or identification tools), the number of taxa should be given in Taxon Count (see below). If possible, avoid repeating the Taxonomic Coverage here. Do not use abbreviated Genus names ("P. vulgaris"). It is recommended to provide author citation to scientific names, to avoid ambiguities in the presence of homonyms (the same name created by different authors for different taxa). Identifier qualifications should be supplied in the Identification Qualifier term rather than here (i. e. "Abies cf. alba" is deprecated, to be replaced with Scientific Taxon Name = "Abies alba" and Identification Qualifier = "cf. species")') module-attribute #

soundevent_term_set = TermSet(terms=[accuracy, alternative, average_precision, balanced_accuracy, bandwidth, capture_device, common_name, country, county, duration, elevation, f1_score, family, genus, high_freq, jaccard_index, location_id, low_freq, mean_average_precision, num_segments, order, scientific_name, state_province, taxonomic_class, top_3_accuracy, true_class_probability], aliases={'species': scientific_name.name, 'genus': genus.name, 'family': family.name, 'order': order.name, 'common_name': common_name.name, 'class': taxonomic_class.name, 'duration': duration.name, 'low_freq': low_freq.name, 'high_freq': high_freq.name, 'location_id': location_id.name, 'site_id': location_id.name, 'country': country.name, 'state': state_province.name}) module-attribute #

state_province = Term(uri='http://rs.tdwg.org/dwc/terms/stateProvince', name='dwc:stateProvince', label='First Order Administrative Division', definition='The name of the next smaller administrative region than country (state, province, canton, department, region, etc.) in which the dcterms:Location occurs.', scope_note='Recommended best practice is to use a controlled vocabulary such as the Getty Thesaurus of Geographic Names. Recommended best practice is to leave this field blank if the dcterms:Location spans multiple entities at this administrative level or if the dcterms:Location might be in one or another of multiple possible entities at this level. Multiplicity and uncertainty of the geographic entity can be captured either in the term dwc:higherGeography or in the term dwc:locality, or both.') module-attribute #

taxonomic_class = Term(uri='http://rs.tdwg.org/dwc/terms/class', name='dwc:class', label='Class', definition='The full scientific name of the class in which the dwc:Taxon is classified.') module-attribute #

top_3_accuracy = Term(name='soundevent_metrics:top3Accuracy', label='Top 3 Accuracy', definition='The proportion of samples where the true class is in the top 3 predicted classes.') module-attribute #

true_class_probability = Term(name='soundevent_metrics:trueClassProbability', label='True Class Probability', definition='The model probability assigned to the true class.') module-attribute #

Classes#

Functions#