1 of 1

70%

Efficient Reinforcement Finetuning via

Adaptive Curriculum Learning

Taiwei Shi¹, Yiyang Wu¹, Linxin Song¹, Tianyi Zhou², Jieyu Zhao¹

¹University of Southern California ²University of Maryland, College Park

ADARFT

Adaptive curriculum learning to improve reinforcement finetuning (RFT) for large language models
A novel algorithm that dynamically matches training task difficulty with the model’s evolving capabilities
Compatible with various RL algorithms (e.g., PPO, GRPO)

2. Difficulty Estimation

4. Discussion: Data Difficulty on Model Performance

Qwen 2.5 7B trained on different data distributions using PPO (Uniform, Easy-Extreme, Hard-Extreme) and

ADARFT instantiated with PPO (Uniform + ADARFT) 🌟

target reward β = 0.5 → learn at a balanced success rate

sensitivity parameter α = 2, step size η = 50 → ensure stable curriculum updates

difficulty range = [0, 100], initial target difficulty T = 0, batch size B = 1024

3. Difficulty Distribution & Result

samples

each set

Better Reasoning Performance

Higher Training Efficiency

~0.5

→

26(44%)

29(48%)

64(107%)

44(73%)

Steps Saved (%)

R=0/1

tanh(±1)≈±0.7616.

+-38.08

# updates