1 of 34

MerlinVision

Annotation

Annotating audio? Continue on!

Annotating photos? Click here.

o ir a la versión en español aquí

2 of 34

Merlin Vision

Sound Annotation Quick Start

© Bryan Calk / Macaulay Library

Versión en español aquí

3 of 34

What is Merlin Sound ID?

A tool to help people become better at identifying bird sounds

while powering…

  • Engagement - helps improve birder ID skills + community awareness of birds

  • Basic Research - expands analytical capabilities

  • Conservation - supports acoustic monitoring efforts and conservation decision making

© Pablo Re / Macaulay Library

4 of 34

How do we make Merlin Sound ID?

Raw Recordings

Annotated Recordings

Merlin Sound ID

5 of 34

How can I contribute?

Raw Recordings

Annotated Recordings

Merlin Sound ID

Record Sounds

  • More than 100 recordings are needed per species
  • Only 20% of bird species have enough recordings
  • Learn how to make bird sound recordings here or here and upload them to eBird here

Annotate Sounds

  • listen to and identify sounds from your region of expertise
  • Read on for more information on how to annotate with Merlin Vision

6 of 34

What are annotations? Time + Frequency + Identification

Annotations are boxes drawn on spectrograms defining the time span, frequency range, and identification of sounds

7 of 34

Why are annotations important?

They train Merlin Sound ID model to recognize …

Other Bird Sounds

Ambient Environmental Noise

Target Sounds

8 of 34

5 Step Annotation Protocol

Step 1 - Box Target Sounds

Step 2 - Box Other Bird Sounds

Step 3 - Box Background No Bird

Step 4 - Set Target Effort

Step 5 - Set Other Birds Effort

Annotation

Effort

ML 2000001 - Chivi Vireo (Vireo chivi)

9 of 34

Step 1 - Target Sounds

Step 1 - Target Sounds

Step 2 - Other Bird Sounds

Step 3 - Background No Bird Box

Step 4 - Target Effort

Step 5 - Other Birds Effort

ML 2000001 - Chivi Vireo (Vireo chivi)

annotate 5 sounds of the target species

  • if there are fewer than 5, annotate all target sounds

Rationale: Sound ID works best when it is trained with sounds that capture the diverse array of acoustic scenarios Sound ID will encounter in the field due to variation in the 1) target species’ repertoire and 2) ambient noise. We have found that five target boxes per recording is a good balance between capturing enough variation within each recording while also allowing us to efficiently annotate more recordings.

10 of 34

Step 2 - Other bird sounds

Step 1 - Target Sounds

Step 2 - Other Bird Sounds

Step 3 - Background No Bird Box

Step 4 - Target Effort

Step 5 - Other Birds Effort

annotate all Other Bird sounds

within 3 seconds

of all Target sound boxes

*box Other Bird sounds that occur in same time window as Target box

Rationale: Sound ID analyzes spectrograms in 3s segments. The 3s Other Birds buffer around each Target ensures that all sounds within 3s of the Target box are identified.

ML 2000001 - Chivi Vireo (Vireo chivi)

11 of 34

Step 3 - Background No Bird

Step 1 - Target Sounds

Step 2 - Other Bird Sounds

Step 3 - Background No Bird Box

Step 4 - Target Effort

Step 5 - Other Birds Effort

Add one Background No Bird box, if possible

  • should not include any audible bird sounds of any species (listen!)
  • can include any non-bird sound (i.e. frogs, insects, rain, handling noise, wind)
  • 1s minimum, 3s+ preferred
  • skip if there are no 1s “empty” sections with no bird sounds
  • Place one up to three Background No Bird boxes, if possible

ML 2000001 - Chivi Vireo (Vireo chivi)

Rationale: Helps prevent Sound ID from incorrectly learning an ambient background noise as a bird species vocalization. If not told otherwise, Sound ID will associate the most dominant signal in the audio with the Target species. For example, stream noise is always present in American Dipper (Cinclus mexicanus) recordings and the model learned to predict American Dipper every time it heard a stream; it focused on the dominant signal that happened to be present in the data, but it was the wrong signal! By adding Background No Bird boxes to all the Dipper recordings, we were able to correct this glitch.

12 of 34

Step 4 & 5 - Effort

Step 1 - Target Sounds

Step 2 - Other Bird Sounds

Step 3 - Background No Bird Box

Step 4 - Target Effort

Step 5 - Other Birds Effort

If you accomplished Steps 1-3 then you've done the default Standard Effort for Targets and Other Birds.

Congratulations!

Click to annotate the next file

If you annotated the sounds of all Targets or all Other Birds,

assign Effort in Steps 4 & 5

13 of 34

Step 4 - Target Effort

Step 1 - Target Sounds

Step 2 - Other Bird Sounds

Step 3 - Background No Bird Box

Step 4 - Target Effort

Step 5 - Other Birds Effort

Did you annotate all Target sounds in the recording?

Yes - select “All are boxed”

No - Don’t do anything! Leave at “Standard Effort (tried to box 5+)”

Pro tip: If you know you did the Default Effort for both Targets and Other Birds, just click in the home screen

14 of 34

Step 5 - Other Birds Effort

Step 1 - Target Sounds

Step 2 - Other Bird Sounds

Step 3 - Background No Bird Box

Step 4 - Target Effort

Step 5 - Other Birds Effort

Did you annotate all* Other Birds sounds in the recording?

Yes - select “All are boxed”

No - Don’t do anything! Leave at “All birds that vocalize near the target species are boxed”

*Important: If there were no Other Birds anywhere in the recording, you still choose “All are boxed”

Pro tip: If you know you did the Default Effort for both Targets and Other Birds, just click in the home screen

15 of 34

How do I box sounds? 1 of 6

Good Box :)

Bad Box :(

Include Harmonics

Do Not Include Echoes

16 of 34

How do I box sounds? 2 of 6

Long Box

Notes <1 sec apart

Short Box

Notes >1 sec apart

Rationale: Sound ID analyzes spectrograms in 3s segments. By ensuring that there are no 3s gaps between sounds in a box, the 1s rule prevents having 3s segments where the model thinks a species’ sound should be present but for which no sound actually exists.

17 of 34

How do I box sounds? 3 of 6

Overlapping Targets

Rationale: Many species are commonly recorded calling at the same time in large flocks. We want to train Sound ID to recognize the acoustic signature of such calling flocks and not just single birds. In practice, it is also very hard to isolate individual calls!

Case 1: Large flock calling at the same time

  • Box continuous sections of sounds from a single species
  • Do not try to isolate individual birds

18 of 34

How do I box sounds? 4 of 6

Overlapping Targets

Case 2: A few individuals with overlapping sounds

Box each individuals’ sounds separately, if possible

Rationale: If the individual’s sounds can be safely distinguished by a human ear that is also what we want Sound ID to be able to do.

19 of 34

How do I box sounds? 5 of 6

Unknown Bird & non-species taxa labels

Examples: Thraupis sp., Aechmophorus occidentalis/clarkii

  • Use these labels if sound is not identifiable to species
  • Choose narrowest possible taxa label

Rationale: Sound ID should only be trained to recognize sounds that human experts consider distinctive and identifiable. For many closely related bird species, certain calls have not been differentiated.

20 of 34

Pro Tips

  1. Annotation works best in Google Chrome and on a large monitor.

  • Hold down the left/right arrow to move through a spectrogram. Much faster than listening to the entire recording.

  • Ctrl + click (or right mouse click) on the spectrogram to duplicate the last box you created. Useful for Vireo.

  • Focus on recordings from your region of expertise* You will work faster with soundscapes you know, leading to more Other Birds identified and higher quality annotations.

*In the online annotation portal you can sort a species’ media list by country

21 of 34

Do not annotate if one of these problems exists*

    • ID problem or no target species present

    • Overlapping non-distinguishable vocalizations from many species (noisy dawn chorus)

    • Recordings of rare sounds (e.g. begging calls) - ask yourself, what would you want Merlin to be able to ID in the field?

    • Uncommon mimicry (not part of a species’ standard repertoire)

*See Appendix 1 for more details

22 of 34

What are the Test/Train/Reference categories?

Reference audio - Curated files featured in Merlin for reference. These should all be completely annotated (Effort All/All)

Test* - Target 20 files per species. These files are used to evaluate how well Sound ID is working for a particular bird.

Train* - Target 80 files per species. These files are used to teach Sound ID what this particular bird sounds like.

*Test and Train files should be annotated the same following the 5-Step protocol.

23 of 34

Nuance 1 - Bird sounds in voice announcements

Box sounds in voice announcements if necessary for your selected effort level

  • Example 1 - 5th target sound is in voice announcement

  • Example 2 - If Effort is All/All all Target and Other Bird sounds in voice announcements should be boxed

Rationale: People talking while using Sound ID is common. We want to train Sound ID to recognize sounds overlapping with human voices.

24 of 34

Nuance 2 - Non-bird sounds

Nuance 3 - Additional Target Boxing

  • Annotating non-bird sounds a very low priority

  • Box chickens/roosters as Red Junglefowl (Gallus gallus)
  • Do not box additional target sounds that appear within 3s of the final box (the farthest right box)

Rationale: This avoids cascading boxing whereby when you annotate a target vocalization and another one occurs in the 3s trailing buffer designated for Other Birds.

= Bird

25 of 34

Why am I annotating so many smartphone recordings?

Rationale: Low quality recordings are typical of smartphones and can be difficult to annotate (sorry!). Nonetheless, smartphone recordings are the most likely type of recording to be presented to Sound ID. Bird sounds that would otherwise be easily identifiable in a high quality recording are often faint, indistinct, or obscured by the sounds of other birds, animals, or ambient noise (cars, rivers, human voices, handling). Thus high quality annotations of low-quality recordings are crucial to improve the performance of Sound ID because they reflect real world acoustic scenarios.

Sound ID operates on smartphones.

Smartphone microphones are low quality,

therefore

we must train Sound ID to perform well on low-quality recordings

Preferable

Typical

26 of 34

Thank you for contributing to Merlin Sound ID!

© San Shaw / Macaulay Library

© Bryan Calk / Macaulay Library

27 of 34

END

28 of 34

Appendix 1 - Problems 1/7

Confusion with other species or too much overlap: Overlapping, non-distinguishable vocalizations from many species

(e.g. noisy dawn chorus)

Some recording are so full of birds it makes boxing individuals very challenging! If this prevents accurate annotation, OR if multiple similar-sounding species are known to be present in the recording (e.g. a flock of parrots), mark this problem type, then skip the file.

29 of 34

Appendix 1 - Problems 2/7

If the target species does not seem to be present or the recording is misidentified, don’t annotate but mark this problem. Please also report in eBird/ML if you are sure.

30 of 34

Appendix 1 - Problems 3/7

Very poor quality or excessive editing

If you can’t see the target bird clearly, just mark as a problem and skip the file. We can always return to them in the future.

31 of 34

Appendix 1 - Problems 4/7

Very poor quality or excessive editing

If the recording has been noise-reduced or otherwise altered so it sounds noticeably unnatural, it should marked as a Problem and not annotated.

32 of 34

Appendix 1 - Problems 5/7

Unusual or rare vocalization (begging, distress, etc.)

Don’t annotate obscure sounds that you wouldn’t often hear in the field, such as begging or distress calls. As usual, indicate the problem type before skipping.

33 of 34

Appendix 1 - Problems 6/7

Uncommon mimicry (not part of a species’ standard repertoire)

Do annotate mimicry when: it’s a typical part of a bird’s sounds, for example mixed into song (Bluethroat, Northern Mockingbird).

Don’t annotate mimicry when: it’s a less common sound a bird makes, and it might be hard to identify without context (a jay imitating a raptor).

Mark the cases of mimicry you don’t annotate as a Problem - “Unusual or rare.”

34 of 34

Appendix 1 - Problems 7/7

Player or tool issue

If the file won’t play, says it has been deleted, or the spectrogram is out of sync with the audio, mark this Problem.