MerlinVision
Annotation
Annotating audio? Continue on!
Annotating photos? Click here.
o ir a la versión en español aquí
Merlin Vision
Sound Annotation Quick Start
© Bryan Calk / Macaulay Library
Versión en español aquí
What is Merlin Sound ID?
A tool to help people become better at identifying bird sounds
while powering…
© Pablo Re / Macaulay Library
How do we make Merlin Sound ID?
Raw Recordings
Annotated Recordings
Merlin Sound ID
How can I contribute?
Raw Recordings
Annotated Recordings
Merlin Sound ID
Record Sounds
Annotate Sounds
What are annotations? Time + Frequency + Identification
Annotations are boxes drawn on spectrograms defining the time span, frequency range, and identification of sounds
Why are annotations important?
They train Merlin Sound ID model to recognize …
Other Bird Sounds
Ambient Environmental Noise
Target Sounds
5 Step Annotation Protocol
Step 1 - Box Target Sounds
Step 2 - Box Other Bird Sounds
Step 3 - Box Background No Bird
Step 4 - Set Target Effort
Step 5 - Set Other Birds Effort
Annotation
Effort
ML 2000001 - Chivi Vireo (Vireo chivi)
Step 1 - Target Sounds
Step 1 - Target Sounds
Step 2 - Other Bird Sounds
Step 3 - Background No Bird Box
Step 4 - Target Effort
Step 5 - Other Birds Effort
ML 2000001 - Chivi Vireo (Vireo chivi)
annotate 5 sounds of the target species
Rationale: Sound ID works best when it is trained with sounds that capture the diverse array of acoustic scenarios Sound ID will encounter in the field due to variation in the 1) target species’ repertoire and 2) ambient noise. We have found that five target boxes per recording is a good balance between capturing enough variation within each recording while also allowing us to efficiently annotate more recordings.
Step 2 - Other bird sounds
Step 1 - Target Sounds
Step 2 - Other Bird Sounds
Step 3 - Background No Bird Box
Step 4 - Target Effort
Step 5 - Other Birds Effort
annotate all Other Bird sounds
within 3 seconds
of all Target sound boxes
*box Other Bird sounds that occur in same time window as Target box
Rationale: Sound ID analyzes spectrograms in 3s segments. The 3s Other Birds buffer around each Target ensures that all sounds within 3s of the Target box are identified.
ML 2000001 - Chivi Vireo (Vireo chivi)
Step 3 - Background No Bird
Step 1 - Target Sounds
Step 2 - Other Bird Sounds
Step 3 - Background No Bird Box
Step 4 - Target Effort
Step 5 - Other Birds Effort
Add one Background No Bird box, if possible
ML 2000001 - Chivi Vireo (Vireo chivi)
Rationale: Helps prevent Sound ID from incorrectly learning an ambient background noise as a bird species vocalization. If not told otherwise, Sound ID will associate the most dominant signal in the audio with the Target species. For example, stream noise is always present in American Dipper (Cinclus mexicanus) recordings and the model learned to predict American Dipper every time it heard a stream; it focused on the dominant signal that happened to be present in the data, but it was the wrong signal! By adding Background No Bird boxes to all the Dipper recordings, we were able to correct this glitch.
Step 4 & 5 - Effort
Step 1 - Target Sounds
Step 2 - Other Bird Sounds
Step 3 - Background No Bird Box
Step 4 - Target Effort
Step 5 - Other Birds Effort
If you accomplished Steps 1-3 then you've done the default Standard Effort for Targets and Other Birds.
Congratulations!
Click to annotate the next file
If you annotated the sounds of all Targets or all Other Birds,
assign Effort in Steps 4 & 5
Step 4 - Target Effort
Step 1 - Target Sounds
Step 2 - Other Bird Sounds
Step 3 - Background No Bird Box
Step 4 - Target Effort
Step 5 - Other Birds Effort
Did you annotate all Target sounds in the recording?
Yes - select “All are boxed”
No - Don’t do anything! Leave at “Standard Effort (tried to box 5+)”
Pro tip: If you know you did the Default Effort for both Targets and Other Birds, just click in the home screen
Step 5 - Other Birds Effort
Step 1 - Target Sounds
Step 2 - Other Bird Sounds
Step 3 - Background No Bird Box
Step 4 - Target Effort
Step 5 - Other Birds Effort
Did you annotate all* Other Birds sounds in the recording?
Yes - select “All are boxed”
No - Don’t do anything! Leave at “All birds that vocalize near the target species are boxed”
*Important: If there were no Other Birds anywhere in the recording, you still choose “All are boxed”
Pro tip: If you know you did the Default Effort for both Targets and Other Birds, just click in the home screen
How do I box sounds? 1 of 6
Good Box :)
Bad Box :(
Include Harmonics
Do Not Include Echoes
How do I box sounds? 2 of 6
Long Box
Notes <1 sec apart
Short Box
Notes >1 sec apart
Rationale: Sound ID analyzes spectrograms in 3s segments. By ensuring that there are no 3s gaps between sounds in a box, the 1s rule prevents having 3s segments where the model thinks a species’ sound should be present but for which no sound actually exists.
How do I box sounds? 3 of 6
Overlapping Targets
Rationale: Many species are commonly recorded calling at the same time in large flocks. We want to train Sound ID to recognize the acoustic signature of such calling flocks and not just single birds. In practice, it is also very hard to isolate individual calls!
Case 1: Large flock calling at the same time
How do I box sounds? 4 of 6
Overlapping Targets
Case 2: A few individuals with overlapping sounds
Box each individuals’ sounds separately, if possible
Rationale: If the individual’s sounds can be safely distinguished by a human ear that is also what we want Sound ID to be able to do.
How do I box sounds? 5 of 6
Unknown Bird & non-species taxa labels
Examples: Thraupis sp., Aechmophorus occidentalis/clarkii
Rationale: Sound ID should only be trained to recognize sounds that human experts consider distinctive and identifiable. For many closely related bird species, certain calls have not been differentiated.
Pro Tips
*In the online annotation portal you can sort a species’ media list by country
Do not annotate if one of these problems exists*
*See Appendix 1 for more details
What are the Test/Train/Reference categories?
Reference audio - Curated files featured in Merlin for reference. These should all be completely annotated (Effort All/All)
Test* - Target 20 files per species. These files are used to evaluate how well Sound ID is working for a particular bird.
Train* - Target 80 files per species. These files are used to teach Sound ID what this particular bird sounds like.
*Test and Train files should be annotated the same following the 5-Step protocol.
Nuance 1 - Bird sounds in voice announcements
Box sounds in voice announcements if necessary for your selected effort level
Rationale: People talking while using Sound ID is common. We want to train Sound ID to recognize sounds overlapping with human voices.
Nuance 2 - Non-bird sounds
Nuance 3 - Additional Target Boxing
Rationale: This avoids cascading boxing whereby when you annotate a target vocalization and another one occurs in the 3s trailing buffer designated for Other Birds.
= Bird
Why am I annotating so many smartphone recordings?
Rationale: Low quality recordings are typical of smartphones and can be difficult to annotate (sorry!). Nonetheless, smartphone recordings are the most likely type of recording to be presented to Sound ID. Bird sounds that would otherwise be easily identifiable in a high quality recording are often faint, indistinct, or obscured by the sounds of other birds, animals, or ambient noise (cars, rivers, human voices, handling). Thus high quality annotations of low-quality recordings are crucial to improve the performance of Sound ID because they reflect real world acoustic scenarios.
Sound ID operates on smartphones.
Smartphone microphones are low quality,
therefore
we must train Sound ID to perform well on low-quality recordings
Preferable
Typical
Thank you for contributing to Merlin Sound ID!
© San Shaw / Macaulay Library
© Bryan Calk / Macaulay Library
END
Appendix 1 - Problems 1/7
Confusion with other species or too much overlap: Overlapping, non-distinguishable vocalizations from many species
(e.g. noisy dawn chorus)
Some recording are so full of birds it makes boxing individuals very challenging! If this prevents accurate annotation, OR if multiple similar-sounding species are known to be present in the recording (e.g. a flock of parrots), mark this problem type, then skip the file.
Appendix 1 - Problems 2/7
If the target species does not seem to be present or the recording is misidentified, don’t annotate but mark this problem. Please also report in eBird/ML if you are sure.
Appendix 1 - Problems 3/7
Very poor quality or excessive editing
If you can’t see the target bird clearly, just mark as a problem and skip the file. We can always return to them in the future.
Appendix 1 - Problems 4/7
Very poor quality or excessive editing
If the recording has been noise-reduced or otherwise altered so it sounds noticeably unnatural, it should marked as a Problem and not annotated.
Appendix 1 - Problems 5/7
Unusual or rare vocalization (begging, distress, etc.)
Don’t annotate obscure sounds that you wouldn’t often hear in the field, such as begging or distress calls. As usual, indicate the problem type before skipping.
Appendix 1 - Problems 6/7
Uncommon mimicry (not part of a species’ standard repertoire)
Do annotate mimicry when: it’s a typical part of a bird’s sounds, for example mixed into song (Bluethroat, Northern Mockingbird).
Don’t annotate mimicry when: it’s a less common sound a bird makes, and it might be hard to identify without context (a jay imitating a raptor).
Mark the cases of mimicry you don’t annotate as a Problem - “Unusual or rare.”
Appendix 1 - Problems 7/7
Player or tool issue
If the file won’t play, says it has been deleted, or the spectrogram is out of sync with the audio, mark this Problem.