Music Composition Using Neural Networks
Shen Ting Ang�Data Science SG 19 Oct 2017
Brief Autobio
© Shen Ting Ang 2017
Collaborators!
My teammates for UCSD CSE253 Final Project (Mar 2016):
Patrick Hsu�Anand Desai�Feichao Qian�Olga Souverneva
© Shen Ting Ang 2017
Popular Topic?
23 Groups in class, 6 (including us) chose this topic back in Mar 2016.
Spoiler: Ours scored the highest.
Other popular topics: Image Captioning, Image Classification, etc.
© Shen Ting Ang 2017
Aims
© Shen Ting Ang 2017
Aims
© Shen Ting Ang 2017
Big Question 1: How to model music?
Two main types of music representation:
Pros and cons of each?
© Shen Ting Ang 2017
Waveform Representation
Raw Waveforms are hard to use for modelling!
Common feature representation: Mel-Frequency Cepstrum Coefficients (MFCC)
End result: each window of audio is represented by a vector of coefficients (size of usually about 10-50)
© Shen Ting Ang 2017
Waveform Representation (MFCC)
© Shen Ting Ang 2017
Waveform Representation (Attempt)
Data: Bach Goldberg Variations - Failed!
© Shen Ting Ang 2017
Sheet Music
© Shen Ting Ang 2017
Notation Representation (MIDI)
Messaging Protocol - “Note on, note off”
Covers:
© Shen Ting Ang 2017
Notation Representation (ABC)
“Textual” representation of MIDI
© Shen Ting Ang 2017
ABC Notation Example
T:291. Was frag’ ich nach der Welt�A|F E/D/ A A|HB3 B|E E A G|F E HD A|B B A G|�HF3 E|F ^G A B/c/4d/4|c B/A/ HA A|A A d =c|HB3 B|�B B e d|Hc3 A|B A B c|Hd3 A|A G/F/ E/F/4G/4 E|HD3|]
© Shen Ting Ang 2017
Data Set (ABC Notation)
© Shen Ting Ang 2017
Bach Chorale Example
© Shen Ting Ang 2017
Why use ABC Notation?
© Shen Ting Ang 2017
Character-Level RNN Text Generation
Training:
Generating:
© Shen Ting Ang 2017
Character-Level RNN Text Generation
© Shen Ting Ang 2017
Source: Andrej Karpathy
RNN vs LSTM vs GRU
© Shen Ting Ang 2017
Easy Implementation with Python/Keras
© Shen Ting Ang 2017
Simple Network Structure
© Shen Ting Ang 2017
Low Footprint Execution
© Shen Ting Ang 2017
Evaluation and Data Sets (Recap)
© Shen Ting Ang 2017
LSTM has a lower loss than RNN
RNN
LSTM
© Shen Ting Ang 2017
Loss Decreases with Deeper Architectures
© Shen Ting Ang 2017
RMSprop gives fastest convergence for loss
© Shen Ting Ang 2017
Big Question 2: How to Evaluate?
Objective Measures?
No AUC, Accuracy, etc.
What about Loss?
Does it make sense to evaluate music on objective measures? Do these even exist?
© Shen Ting Ang 2017
An Objective Evaluation Method
Euler’s Gradus Suavitatis measure of melodiousness (1739) - higher is better
Aird (Original) | 7.059 |
Aird (RNN) | 6.191 |
Aird (LSTM) | 5.743 |
Aird (GRU) | 5.890 |
Bach Chorales (Original) | 6.821 |
© Shen Ting Ang 2017
Subjective Evaluation by Humans
Used extensively in Text-To-Speech Generation studies!
Using a similar idea, ask participants to rate on 1-10 for:
Also: do you think this was “composed” by a computer? (yes/no)
© Shen Ting Ang 2017
Setting up Subjective Evaluation
25 Volunteers were presented with 10 samples:
© Shen Ting Ang 2017
Range of Human Subjects
Mean Age | 28.56 |
Median Age | 26 |
Std Dev of Age | 8.84 |
Music Professionals | 4 |
Some Musical Background | 15 |
No Musical Background | 6 |
© Shen Ting Ang 2017
Human Evaluation
© Shen Ting Ang 2017
Human Evaluation
© Shen Ting Ang 2017
“Best” Network for task?
Appears to be:
© Shen Ting Ang 2017
Demo Music (Used in Evaluation)
Monophonic 1: https://www.youtube.com/watch?v=-Fvt2lzLEGo
Monophonic 2: https://www.youtube.com/watch?v=GHOWlDg_bM4
Polyphonic 1: https://www.youtube.com/watch?v=0AAT83i3op0
Polyphonic 2: https://www.youtube.com/watch?v=rOJxTYwWRUc
© Shen Ting Ang 2017
Answers
Monophonic 1: Original
Monophonic 2: LSTM
Polyphonic 1: LSTM
Polyphonic 2: Original
© Shen Ting Ang 2017
Sheet Music of Polyphonic Output (Nottingham)
© Shen Ting Ang 2017
Sheet Music of Polyphonic Output (Bach)
© Shen Ting Ang 2017
Defects of Generated Music
© Shen Ting Ang 2017
Discussion - WaveNets
https://deepmind.com/blog/wavenet-generative-model-raw-audio/
© Shen Ting Ang 2017
Discussion - Composition vs Arrangement
© Shen Ting Ang 2017
Discussion - Human vs Machine?
Is machine music composition adversarial to human composers?
© Shen Ting Ang 2017
Conclusions
© Shen Ting Ang 2017
Acknowledgements
Teammates: Patrick, Anand, Feichao, Olga
CSE253 Teaching Staff (Prof Cottrell and TAs)
Friends and Family who volunteered their time to be evaluators
Accenture (for today’s venue) and DSSG (for the invite)
© Shen Ting Ang 2017
References
MFCCs:
Gradus Suvitatis:
© Shen Ting Ang 2017
References
Text Generation on RNN:
ABC Notation
© Shen Ting Ang 2017
References
Music Composition with RNNs:
Google WaveNets:
© Shen Ting Ang 2017
References (From Yitch)
Adobe Voco - “Photoshop of Voice”
SongSim
© Shen Ting Ang 2017
Further Questions?
© Shen Ting Ang 2017