1 of 1

  1. Introduction

70%

Efficient Reinforcement Finetuning via

Adaptive Curriculum Learning

Taiwei Shi1, Yiyang Wu1, Linxin Song1, Tianyi Zhou2, Jieyu Zhao1

1University of Southern California 2University of Maryland, College Park

ADARFT

  • Adaptive curriculum learning to improve reinforcement finetuning (RFT) for large language models
  • A novel algorithm that dynamically matches training task difficulty with the model’s evolving capabilities
  • Compatible with various RL algorithms (e.g., PPO, GRPO)

2. Difficulty Estimation

4. Discussion: Data Difficulty on Model Performance

Qwen 2.5 7B trained on different data distributions using PPO (Uniform, Easy-Extreme, Hard-Extreme) and

ADARFT instantiated with PPO (Uniform + ADARFT) 🌟

target reward β = 0.5 → learn at a balanced success rate

sensitivity parameter α = 2, step size η = 50 → ensure stable curriculum updates

difficulty range = [0, 100], initial target difficulty T = 0, batch size B = 1024

3. Difficulty Distribution & Result

  • 10000

samples

each set

Better Reasoning Performance

Higher Training Efficiency

~0.5

*

26(44%)

29(48%)

64(107%)

44(73%)

Steps Saved (%)

R=0/1

tanh(±1)≈±0.7616.

+-38.08

# updates