1 of 24

GitHub: https://github.com/efficientscaling/Z1

Zhaojian Yu

2 of 24

OpenAI o1

3 of 24

o1 - Test-time scaling

4 of 24

5 of 24

6 of 24

7 of 24

How can we reduce the model’s thinking tokens consumption while preserving its reasoning performance?

8 of 24

Z1: Efficient Test-time Scaling with Code

9 of 24

Data

Inference

Z1-Reasoning

Shifted Thinking

Z1

+

=

10 of 24

Collect 107K questions

11 of 24

Evolving on Complexity

Original Problem

LLM

Collect 107K questions

12 of 24

Evolving on Complexity

Original Problem

Complex �Problem

LLM

Collect 107K questions

13 of 24

Distill reasoning traces & answers

Convert the following query into a T-SQL statement.�SELECT Employee_ID, Name�FROM Employees�WHERE Salary > 50000

Simple Problem:

In my machine learning course we have covered the key ideas behind principal component analysis. To round this part of the course off, we have learned to interpret the results of PCA, specifically using a biplot. Have I missed something, or is PCA mostly something that is used in conjuction with other ML methods to obtain a result? For example, in a biplot I can tell which observations are similar and how much each variable is contributing overall to a specific component, but how is this useful outside of employing something like clustering after-the-fact? From my point of view, it feels like PCA is the 'first-step' towards gaining some sort of insight but it doesn't tell you much that is useful on its own.

Complex Problem:

14 of 24

Distill reasoning traces & answers

Your query is already written in T-SQL. The same statement would go as:

```sql

SELECT Employee_ID, Name

FROM Employees

WHERE Salary > 50000

```

Convert the following query into a T-SQL statement.�SELECT Employee_ID, Name�FROM Employees�WHERE Salary > 50000

Simple Problem:

Short Trajectory:

Principal Component Analysis (PCA) is indeed a fundamental technique in data analysis and machine learning, and your intuition about it being a first step is correct. However, PCA can be quite useful on its own in several ways beyond just being a preprocessing step for other methods like clustering. Let's explore some of these applications in more detail….. �

In my machine learning course we have covered the key ideas behind principal component analysis. To round this part of the course off, we have learned to interpret the results of PCA, specifically using a biplot. Have I missed something, or is PCA mostly something that is used in conjuction with other ML methods to obtain a result? For example, in a biplot I can tell which observations are similar and how much each variable is contributing overall to a specific component, but how is this useful outside of employing something like clustering after-the-fact? From my point of view, it feels like PCA is the 'first-step' towards gaining some sort of insight but it doesn't tell you much that is useful on its own.

Complex Problem:

Long Trajectory:

15 of 24

Shifted Thinking

16 of 24

Shifted Thinking

Budget forcing (s1):

Shifted Thinking (Z1):

17 of 24

Training & Results

18 of 24

Training

Qwen2.5Coder-7B-Instruct

12 hours

8 A100s

19 of 24

Z1 - Efficient reasoning

20 of 24

Main Results

Non-Reasoning Benchmark

21 of 24

Data ablations

22 of 24

Example

23 of 24

Example

24 of 24

Example