DATA-DRIVEN SOUND AND INTERACTION DESIGN
Lonce Wyse
OVERVIEW
Intro
NEXT 5 WEEKS
Eyes-free
Games
Sound Modeling
Musical expectation
Modeling Gamakas in Carnatic Music
Anticipatory
Improvisation
Voice-controlled synthesis
Vibrotactile Musical Experience for the Deaf
Mobile platform for audience engagement
w/ Suranga Nanayakkara
w/ Srikumar Subramanian
w/ Trevor Penney, Annett Schirmer
w/ Stefano Fasciani
Arts and Creativity Lab
National University of Singapore
Sonic Bard
A CENTURY AGO
Edgard Varese: “Liberation of sound”, and music as “organized sound”
Percy Grainger, “Free Music Machines” [link]
Luigi Russolo, Art of Noises manifesto (1913) [link]
Nikolai Kulbin, 1909ish
Francis Dhomont, Points de fuite (1982)
Analog, recording, & then computers….
MUSIC AND SOUND DESIGN
AUDIO VS SYMBOLIC
SOUND MODELS
But programming sound is hard….
OBJECTIVE: SOUND-TO-MODEL (“S2M”)
Data-driven modeling
NOVELTY
“Out of domain”
MORPHING
“Generalization”, Tweening sounds and Interpolation
Manually:
Gesang der Junglinge (1956) for electronics, tape manipulation
Redbird: A Political Prisoner's Dream (1973-77)
The issue of time
Slaney (1993) – Identify t/f “correspondences”
Compare to image morphing
Sounding object characteristics
Pruvost, L., Scherrer, B., Aramaki, M., Ystad, S., & Kronland-Martinet, R. (2015). Perception-based interactive sound synthesis of morphing solids' interactions. In SIGGRAPH Asia 2015 Technical Briefs (pp. 1-4).
MODELING & PLAYABILITY
GANSYNTH
Trained on 2D and 2-channel spectrograms,
Magnitude & “instantaneous frequency” (IF – time derivative of phase)
Engel, Jesse, et al. "Gansynth: Adversarial neural audio synthesis."
arXiv preprint arXiv:1902.08710 (2019).
But musical instruments ….
instantaneous frequency
GANS: A CLOSER LOOK
G
D
params
Real
?
Fake
Generates n-seconds of sounds for each parameter configuration
OBJECTIVE FUNCTIONS
“Jensen-Shannon Divergence”
GAN “LATENT SPACE”
G
D
params
Real
?
Fake
Generates n-seconds of sounds for each parameter configuration
RNN: TRAINING DATA
Can it generalize? ….
Wyse, L. Real-valued parametric condition of an RNN for interactive sound synthesis. In Proceedings of the Proceedings of the 6th International Workshop on Musical Metacreation, ACM Conference on Computational Creativity. Salamanca, Spain, June, 2018.
x1
y1
input
GRU
GRU
GRU
p1
p2
…pn
Audio
params
output
layers
GENERALIZATION (RNN)
Train: Trumpet and Clarinet,
12 pitches, spanning octave: E4, E5
Generate: Clarinet,
Continuous sweep, spanning octave: E4, E5
note playability
TRUMPINET
Generate: Trumpinet (mid-point inst=.5 instrument),
Continuous sweep, spanning octave: E4, E5
Train: Trumpet (inst=0, and Clarinet, inst=1
12 pitches, spanning octave: E4, E5
Where could we get *that* data?
IN THE DIGITAL LUTHIER’S TOOL SET
Generates n-seconds of sounds for each parameter configuration
Generates 1 sample
at a time
Real data
Fake data
G
D
params
Real
?
Fake
RNN
x
params
y
The “Performer”
The “Interpolator”
~ 128 Dims
Sound complexity?
WHAT KIND OF SOUND?
Examples
Dripping
Engine
Wind
Fire
Clarinet
Pops
Dripping2
Thus we are talking about a huge space (compared to speech or musical notes), but with some constraints amenable to modeling.
SYNTHETIC AUDIO TEXTURE DATASETS
Parameters
Argh! Why am I still coding synths!?
clap
clapper
applause
SynTex
[Weblink]
Syntex: parametric audio texture datasets for conditional training of instrumental interfaces., L. Wyse, Ravikumar, P.T (New Interfaces for Musical Expression, NIME 2022)
PURE “STYLE” -- VARIATION AND EXTENSION
Original short clip
Continuation
Continuation
Continuation
WHAT ABOUT “DYNAMIC” TEXTURES?
Water fill
But are images good representations for textures?
Content?
NATURE OF THE GAN LATENT SPACE
GAN SUBSPACE SELECTION AND SMOOTHING
Sound Model Factory: An Integrated System Architecture for Generative Audio Modelling, L. Wyse, C. Gupta, P. Kamath (EvoMusArt, 2022)
OK – Let’s put it all together!
Teuvo Kohonen’s Self-Organizing Map (“SOM”)
THE BIG PICTURE
“Interpolator”
“Performer”
RNN TEXTURE MODELING
Regular
Random
x1
y1
input
GRU
GRU
GRU
p1
p2
…pn
Audio
params
output
layers
unfilling
recorded
Recorded
data, too:
DESIGNING NAVIGATION
GAN Generator
Designers choice!
Parameters from labeled data (pitch, roughnes)
128 D “latent” space
BOREILLY TEXTURE EXAMPLE
4 points define a 2D space.
128 D “latent” space
NAVIGATION
[Weblink]
FAKE TWEEN DATA FOR TRAINING MORPHS
With no tween data for training,
The RNN fails to morph
With FAKE tween data for training,
The RNN can learn.
Real time continuous tweening of pitch and timbre
[Weblink]
Pitch conditioned, instruments located in z-space
BREAK
OTHER NAVIGATION STRATEGIES
RECENT EXPLORATIONS
Gupta, C., Kamath, P., Wei, Y.Z, Li Z., Nanayakkara, S., Wyse, L. (2023). Towards. Controllable Audio Texture Morphing. International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
QUERY BY SYNTHESIS
Kamath, P., Li, Z., Gupta, C., Jaidka, K., Nanayakkara, S., & Wyse, L. (2023). An Example-Based Framework for Perceptually Guided Audio Texture Generation. ACM Multimedia
Also good for how to gt control of the latent space in a GAN
DATA-DRIVEN SOUND MODELING
Sound
Model
Factory
NUS colleagues:�Chitralekha Gupta, Purnima Kamath