1 of 17

Hidden Echoes Survive generative Audio instrument training

Chris Tralie, Ursinus College

Matt Amery

Create & Innovate UK

Ian Utz Ben Douglas

Ursinus College

2 of 17

Motivation

3 of 17

Motivation

4 of 17

Motivation: Where Is Training Data Used?

  • Want to do the same for audio to audio style transfer models with hidden signals

Example courtesy of Josh Brown in CS 372, spring 2023

https://ursinus-cs372-s2023.github.io/CoursePage/Assignments/HW6_StringAlong/

Engel, Jesse, Chenjie Gu, and Adam Roberts. "DDSP: Differentiable Digital Signal Processing." International Conference on Learning Representations. 2019.

5 of 17

Echo Hiding: A Simple classical idea

Gruhl, Daniel, Anthony Lu, and Walter Bender. "Echo hiding." Information Hiding: First International Workshop Cambridge, UK, May 30–June 1, 1996 Proceedings 1. Springer Berlin Heidelberg, 1996.

6 of 17

Uncovering Hidden Echoes via Cepstrum

7 of 17

Examples Watermarking Rave

  • Rave trained on 3 hours of acoustic guitar data [2]
  • Original

[1]Caillon, Antoine, and Philippe Esling. "RAVE: A variational autoencoder for fast and high-quality neural audio synthesis." arXiv preprint arXiv:2111.05011 (2021).

[2] Xi, Qingyang, et al. "GuitarSet: A Dataset for Guitar Transcription." ISMIR. 2018.

Clean

50

75

100

8 of 17

Examples Watermarking Dance Diffusion

  • Rave trained on 3 hours of acoustic guitar data [4]
  • Original

[3]Evans, Z. 2022. Dance Diffusion. https://github.com/harmonai-org/sample-generator.

[4]Gillick, J.; Roberts, A.; Engel, J.; Eck, D.; and Bamman, D. 2019. Learning to Groove with Inverse Sequence Transfor-

mations. In International Conference on Machine Learning (ICML).

Clean

50

75

100

9 of 17

Single Echo Results

10 of 17

Pseudorandom Time-Spread Echo Patterns

Ko, B.-S.; Nishimura, R.; and Suzuki, Y. 2005. Time-spread echo method for digital audio watermarking. IEEE Transactions on Multimedia, 7(2): 212–221.

11 of 17

Pseudorandom Time-Spread Echo Patterns

Longer durations are more robust

12 of 17

Pseudorandom Time-Spread Echo Patterns on Rave And DDSP

Area under area under ROC curves

13 of 17

Single Echoes Survive pitch SHift Data Augmentation

Z-scores generally decrease for an increasing probability of pitch augmentation, though they remain detectable even for high rates of augmentation.

14 of 17

Mixed Echoes Can Be Demixed using Demucs

Défossez, A.; Usunier, N.; Bottou, L.; and Bach, F. 2019. Music Source Separation in the Waveform Domain. arXiv preprint arXiv:1911.13254.

Rafii, Z.; Liutkus, A.; Stöter, F.-R.; Mimilakis, S. I.; and Bittner, R. 2019. MUSDB18-HQ - an uncompressed version of MUSDB18.

Original mix from demucs

Mixed rave style transfer on individual tracks

15 of 17

Next Steps

  • Fine tuning on larger pretrained models
  • Watermark different parts of datasets with different watermarks

Rave VocalSet male/female

Dane Diffusion Fine Tuning

16 of 17

Special Thanks To Bill Mongan And Leslie New

For letting me run computers in their offices every day for 5 months straight…

17 of 17

Code, Supplementary Material, ETc