1 of 15

Unifying Symbolic Music Arrangement: Track-Aware Reconstruction and Structured Tokenization

Longshen Ou, Jingwei Zhao, Ziyu Wang, Gus Xia, Qihao Liang, Torin Hopkins, and Ye Wang

Fine-tuned decoder only Transformer

2 of 15

What is Music Arrangement?

Transforms the same musical idea into new instrumental forms
Preserving its essence while changing timbre and interpretation

Original music

Acoustic cover

Strings trio

Orchestra

3 of 15

What is Music Arrangement?

Transforms the same musical idea into new instrumental forms
Preserving its essence while changing timbre and interpretation

Original music

Acoustic cover

Strings trio

Orchestra

4 of 15

Problem Definition

Generate new tracks from existing tracks
That reinterpret the original music or remain compatible with original music

Original music

Music with new instrument combination

Solo instrument

Music with added parts

Reinterpret

Simplify

Add tracks

5 of 15

Methodology

Paired arrangements data are extremely rare

Version 1

Version 2

Model

6 of 15

Methodology

Learn reconstruction instead

Music

Original Music

Model

What is played

By whom it is played

Music

Arranged Music

Model

What is played

By whom we want it to be played

User

(Content)

(Style)

Training

Inference

(Ori. Content)

(New Style)

7 of 15

Unsupervised Reconstruction Objective

Condition contain 3 subsequences: �Instrument, flattened note stream, target-side history
Fine-tuned from pre-trained decoder only model

8 of 15

REMI-z: Alleviate Content Fragmentation

REMI-z (Ours)

REMI+

Preserve track continuity to easier learning of idiomatic instrument behavior

Time-ordered tokenization interleave content of different instruments

o-X: onset time, i-X: instrument, p-X: pitch, d-X: duration, b-1: end of bar

Contents of same instruments are in same color

9 of 15

Tasks

10 of 15

Key Findings

Our model performs consistently and significantly better than task-specific baselines, both in objective metrics and in human evaluations.
Our tokenization scheme improves every evaluated aspect of arrangement performance.
Pre-training plays a crucial role.

11 of 15

Demos

Original

Arrangement for string trio

Example: Band → String Trio

12 of 15

Takeaways

Unified pipeline

No task-specific model design

Unsupervised training

Learn interpretation by reconstruction

Effective tokenization

Track continuity facilitate learning music structure

13 of 15

Bonus

REMI-z simplify note-level modeling

with lower uncertainty

Arrangement -> General generation tasks

MIDI

Piano Roll

Tab*

REMI-z seq

MultiTrack

A Hierarchical data structure

(a bar seq)

Melody

Chord

Key normalize

Manipulation

pip install REMI-z

the REMI-z package

*Ongoing work

Bar

Track

Note

github.com/Sonata165/REMI-z

14 of 15

REMI-z simplify note-level modeling

with lower uncertainty

Arrangement -> General generation tasks

MIDI

Piano Roll

Tab*

REMI-z seq

MultiTrack

A Hierarchical data structure

(a bar seq)

Melody

Chord

Key normalize

Manipulation

pip install REMI-z

the REMI-z package

*Ongoing work

Bar

Track

Note

github.com/Sonata165/REMI-z

15 of 15

Thanks

Demo, code, and model available online.

www.oulongshen.xyz/automatic_arrangement