x-to-audio: General Audio Synthesis From Various Input Prompt
Keisuke Imoto a
Doshisha University
1/23
Jun. 27, 2024
What is generative AI?
2/23
What is generative AI?
3/23
text-to-speech
text-to-text
speech-to-speech
text-to-image
What is generative AI?
4/23
text-to-speech
text-to-text
speech-to-speech
text-to-image
x-to-audio synthesis (xTA)
5/23
Applications of general audio synthesis
6/23
General audio synthesis using event label (label-to-audio)
7/23
※ https://deepmind.com/blog/article/wavenetgenerative-model-raw-audio
Wave generation using WaveNet
Quiz
Alarm clock
Maracas
Coffee grinder
8/23
Quiz
Alarm clock
Maracas
Coffee grinder
9/23
Quiz
Alarm clock
Maracas
Coffee grinder
10/23
Challenges in label-to-audio synthesis
structure of sounds
11/23
Synthesized
sound 1
Synthesized
sound 2
Synthesized
sound 3
Synthesized
sound 4
text-to-audio (TTA) synthesis
12/23
Training and sampling process
of AudioLDM
Example of text-to-audio (TTA) synthesis
13/23
Onomatopoeia-to-audio synthesis
14/23
peep
beeeeeep
Example of synthesized sounds
15/23
Variations of synthesized sounds w/ onoma-to-wave
16/23
Synthesized sound
by /ch i N q/ (チンッ)
Synthesized sound by /ch i: N q/ (チィンッ)
Synthesized sound by /p i N q/ (ピンッ)
voice-to-audio conversion
+ k-means
17/23
peep
beeeeeep
Example of synthesized sounds from vocal imitation
18/23
Pitch- and rhythm-changed input vocal imitation
19/23
Image-to-speech/audio synthesis
[Ohnaka+ 2023]
20/23
Example of synthesized sound by visual Onoma-to-wave
repetitions
→
21/23
Example of synthesized sound by visual Onoma-to-wave
repetitions of different width
22/23
Conclusion
of general sounds, not just speech or music
23/23