GitHub: https://github.com/efficientscaling/Z1
Zhaojian Yu
OpenAI o1
o1 - Test-time scaling
How can we reduce the model’s thinking tokens consumption while preserving its reasoning performance?
Z1: Efficient Test-time Scaling with Code
Data
Inference
Z1-Reasoning
Shifted Thinking
Z1
+
=
Collect 107K questions
Evolving on Complexity
Original Problem
LLM
Collect 107K questions
Evolving on Complexity
Original Problem
Complex �Problem
LLM
Collect 107K questions
Distill reasoning traces & answers
Convert the following query into a T-SQL statement.�SELECT Employee_ID, Name�FROM Employees�WHERE Salary > 50000
Simple Problem:
In my machine learning course we have covered the key ideas behind principal component analysis. To round this part of the course off, we have learned to interpret the results of PCA, specifically using a biplot. Have I missed something, or is PCA mostly something that is used in conjuction with other ML methods to obtain a result? For example, in a biplot I can tell which observations are similar and how much each variable is contributing overall to a specific component, but how is this useful outside of employing something like clustering after-the-fact? From my point of view, it feels like PCA is the 'first-step' towards gaining some sort of insight but it doesn't tell you much that is useful on its own.
Complex Problem:
Distill reasoning traces & answers
Your query is already written in T-SQL. The same statement would go as:
```sql
SELECT Employee_ID, Name
FROM Employees
WHERE Salary > 50000
```
Convert the following query into a T-SQL statement.�SELECT Employee_ID, Name�FROM Employees�WHERE Salary > 50000
Simple Problem:
Short Trajectory:
Principal Component Analysis (PCA) is indeed a fundamental technique in data analysis and machine learning, and your intuition about it being a first step is correct. However, PCA can be quite useful on its own in several ways beyond just being a preprocessing step for other methods like clustering. Let's explore some of these applications in more detail….. �
In my machine learning course we have covered the key ideas behind principal component analysis. To round this part of the course off, we have learned to interpret the results of PCA, specifically using a biplot. Have I missed something, or is PCA mostly something that is used in conjuction with other ML methods to obtain a result? For example, in a biplot I can tell which observations are similar and how much each variable is contributing overall to a specific component, but how is this useful outside of employing something like clustering after-the-fact? From my point of view, it feels like PCA is the 'first-step' towards gaining some sort of insight but it doesn't tell you much that is useful on its own.
Complex Problem:
Long Trajectory:
Shifted Thinking
Shifted Thinking
Budget forcing (s1):
Shifted Thinking (Z1):
Training & Results
Training
Qwen2.5Coder-7B-Instruct
12 hours
8 A100s
Z1 - Efficient reasoning
Main Results
Non-Reasoning Benchmark
Data ablations
Example
Example
Example