1 of 21

A: (ChatGPT 4o) Alice ha due fratelli e tre sorelle. Ogni fratello di Alice ha gli stessi fratelli e sorelle di Alice, tranne se stesso. Quindi, ogni fratello ha un fratello (l'altro fratello di Alice) e tre sorelle (Alice e le altre due sorelle). Ogni fratello di Alice ha tre sorelle.

A: (ChatGPT 4.5) Ogni fratello di Alice ha quattro sorelle.

Ragionamento: Alice ha due fratelli e tre sorelle. Le sorelle sono dunque Alice + altre 3 sorelle, per un totale di 4 sorelle. Ogni fratello di Alice avrà quindi tutte e 4 queste sorelle (Alice inclusa).

Quindi, la risposta è: 4 sorelle

A: (o1) <reasoning> Clarifying the scenario: I'm piecing together information to determine that Alice has two brothers and three sisters, which totals six siblings. Revisiting Alice's family: Alice has two brothers and three sisters, totaling six siblings, including herself. The phrase "due fratelli e tre sorelle" clarifies this.

<answer> Se Alice ha due fratelli e tre sorelle, in totale ci sono quindi 6 figli in famiglia: Alice stessa, altri 2 fratelli e 3 sorelle. Per ciascuno dei fratelli, le sorelle sono tutte le figlie femmine della famiglia tranne se stesso (ovviamente). Dunque: Le sorelle sono Alice e le altre 3 sorelle (4 femmine in tutto). Di conseguenza, ogni fratello di Alice ha 4 sorelle.

5 of 21

Behind the curtain

Almost no technical details have been revealed about the structure and the training of o1:

But then…

Also due to this,�some implicit messages spread out:

reasoning models need to be large
and as such require huge amounts of resources (as for the Stargate project)
and therefore only a few US companies can develop them

[...]

https://openai.com/index/learning-to-reason-with-llms/#hiding-the-chains-of-thought

6 of 21

https://arxiv.org/pdf/2501.12948

7 of 21

What DeepSeek did

Developed an LLM, V3 (671B)
Basing on it, developed an LRM, R1
Distilling from it, created a training set of (800k) curated reasoning traces
Basing on various versions of Qwen and Llama, developed several SRM
Openly released all models

https://huggingface.co/deepseek-ai/DeepSeek-R1

8 of 21

https://www.cnbc.com/2025/01/27/nvidia-sheds-almost-600-billion-in-market-cap-biggest-drop-ever.html�https://www.google.com/search?q=nvidia+stock

9 of 21

“It thinks before it answers”: an (uncensored) example

Question: Vuoi propormi un titolo creativo e arguto, ma formalmente corretto, per una lezione a proposito di "modelli di ragionamento", come tu sei?

Thoughts: Okay, so I need to come up with a creative and witty title for a lesson about reasoning models. Let me think about synonyms for reasoning models. They could be thought of as frameworks, structures, blueprints, or even tools for thinking. I want the title to sound both professional and intriguing. Maybe something like "Navigando i Modelli di Ragionamento" which means "Navigating Reasoning Models." That gives a sense of journey or exploration. Or perhaps "Modelli di Ragionamento: Chiavi per il Pensiero Critico," meaning "Reasoning Models: Keys to Critical Thinking." That emphasizes the unlocking aspect.�Wait, maybe using "Strumenti del Ragionamento" could work too: "tools of reasoning". So something like "Strumenti del Ragionamento: Architettura della Logica" translating to "Tools of Reasoning: Architecture of Logic," which sounds technical yet creative.�I should also consider if the title is too complex or not clear enough. It needs to be formal but still catchy. Maybe combining words that evoke both structure and creativity, like "Modelli di Ragionamento: Disegnare il Vostro Pensiero." That means "Reasoning Models: Design Your Thinking," which suggests a creative process.�Another thought is using action verbs to make it dynamic, like "Decodificando i Modelli di Ragionamento" or "Decoding Reasoning Models." It gives the sense of uncovering something valuable. Or perhaps "Modelli di Ragionamento: La Struttura del Pensiero Efficace," meaning "Reasoning Models: The Structure of Effective Thinking."�I should also think about the audience—probably students or professionals looking to enhance their reasoning skills. So, the title should resonate with them, offering both learning and practical application. Maybe combining a metaphor with structure: "Modelli di Ragionamento: Il ponte tra idea e realtà" meaning "Reasoning Models: The bridge between idea and reality." That suggests that reasoning models help connect thoughts to actionable outcomes. Alternatively, something like "Modelli di Ragionamento: Costruire il Vostro Sistema Logico," which translates to "Reasoning Models: Build Your Logical System." It emphasizes construction and building, which is proactive.�I think I've got a few options now. Each has its own angle—some focus on structure, others on action or metaphor. The best might be one that's both clear and evokes the right emotions about reasoning models.

Answer: Modelli di Ragionamento: Il Ponte Tra Idea e Realtà�Questo titolo combina elegantemente chiarezza con un tocco metaforico, suggerendo che i modelli di ragionamento servono da ponte che collega idee astratte a risultati pratici. Invita gli studenti a immaginare il loro percorso dalla concettualizzazione all'implementazione, sottolineando sia gli aspetti strutturali che quelli trasformativi dei modelli di ragionamento.

10 of 21

From Language Models to Reasoning Models

The prompt used in training by DeepSeek:

A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e.,

<think> reasoning process here </think> <answer> answer here </answer>

https://openai.com/index/learning-to-reason-with-llms

11 of 21

(a very short introduction to reinforcement learning)

(are you aware of https://github.com/openai/gym, first (openly…) released in 2016?)

But of course one prompt is not sufficient…

12 of 21

Learning / training strategies

Imitation / supervised learning�Given a request whose correct answer is known:�

1. the request is submitted to the chatbot

2. the chatbot produces a response

3. the response is matched with the correct one

4. if there is a difference, the behavior is adjusted

and the process is repeated with other requests

Requirements:

– a set of pairs (request, correct response)�– a method for matching the responses�– a method for adjusting the behavior

Trial-and-error / reinforcement learning�Given a request whose possible answers�have a quality that can be evaluated:

1. the request is submitted to the chatbot

2. the chatbot produces a response

3. the quality of the response is evaluated

4. the behavior is adjusted to improve the quality

and the process is repeated with other requests

Requirements:

– a method for evaluating the quality� of the responses to the requests� (e.g., a “reward” function f(request, response))�– a method for adjusting the behavior

13 of 21

A matter of learning / training, as usual…

“There are two major types of learning, in both children and in deep learning. There is 1) imitation learning (watch and repeat, i.e. pretraining, supervised fine tuning), and 2) trial-and-error learning (reinforcement learning). [...] Almost every single shocking result of deep learning, and the source of all *magic* is always 2. 2 is significantly significantly more powerful. 2 is what surprises you. [...] 2 is the ‘aha moment’ when the DeepSeek (or o1 etc.) discovers that it works well to re-evaluate your assumptions, backtrack, try something else, etc. It’s the solving strategies you see this model use[s] in its chain of thought. It’s how it goes back and forth thinking to itself. These thoughts are *emergent* (!!!) and this is actually seriously incredible, impressive and new (as in publicly available and documented etc.). The model could never learn this with 1 (by imitation), because the cognition of the model and the cognition of the human labeler is different. The human would never know to correctly annotate these kinds of solving strategies and what they should even look like. They have to be discovered during reinforcement learning as empirically and statistically useful towards a final outcome.”

https://x.com/karpathy/status/1883941452738355376

14 of 21

Three layers?

1. Programmed machines

2. Learning machines trained by imitation

3. Learning machines trained by trial-and-error

15 of 21

What is happening?

12 Sep 24: OpenAI o1

28 Nov 24: Qwen QwQ-32B preview

21 Jan 25: DeepSeek-R1

31 Jan 25: OpenAI o3-mini

31 Jan 25: Simplescaling s1-32B (ArXiv)

2 Feb 25: OpenAI deep research

5 Feb 25: Google Gemini 2.0 Flash Thinking

5 Feb 25: GAIR LIMO (ArXiv)

12 Feb 25: Open thoughts OpenThinker-32B (announcement)

27 Feb 25: OpenAI GPT-4.5

5 Mar 25: Qwen QwQ-32B

16 of 21

Evaluating the quality of “reasoning”: an example

The American Invitational Mathematics Examination (AIME) is … used to challenge�bright students on the path toward choosing the team that represents the United States at the International Mathematics Olympiad. … Most AIME participants are high school students.�… The AIME is a 15 question, 3 hour exam.�… Each answer is an integer from 000 to 999, inclusive, making guessing almost futile.�… No aids other than scratch paper, rulers and compasses are permitted.�(https://artofproblemsolving.com/wiki/index.php/American_Invitational_Mathematics_Examination)

An example (AIME 2025 I, problem 1):�Find the sum of all integer bases b>9 for which 17_b is a divisor of 97_b.

Some data:�https://docs.google.com/spreadsheets/d/1iiUpuF67QhNaTvJsHcvN2PEg07GTjwMgo2TAisPILO4

17 of 21

Some reflections: openness

A significant part of the development around GenAI is publicly accessible,�in particular on ArXiv, GitHub, and Hugging Face

This has made the dissemination of information practically instantaneous:�it took only a few days from the publication of the first open reasoning model�for other models to be developed, not only in the United States

18 of 21

Some reflections: accessibility and standardization

The (hardware and software) barriers to entry have been lowered

All RMs are in fact the result of the same kind of process:�a pretrained transformer is taken and further trained with some sort of CoT

Pretrained transformers are now commodities�(e.g., the RMs from Simple Scaling, GAIR, and Open Thoughts, are all based on Qwen 2.5-32B)

It is as if pre-trained transformers have become universal enablers:� “give me a pretrained transformer and I will build a language model� with increasingly better capabilities”

19 of 21

Some reflections: targets and role of data

“Reasoning” capabilities are the result of increasingly sophisticated training,�and to this purpose the quality of data is not less important than their quantity

(DeepSeek-R1-distill: 800k; OpenThinker: 100k; s1 and LIMO: 1k reasoning traces)

“While conventional wisdom suggests that sophisticated reasoning tasks demand extensive training data�(>100,000 examples), we demonstrate that complex mathematical reasoning abilities can be effectively�elicited with surprisingly few examples.” (https://arxiv.org/abs/2502.03387)

“Small is beautiful” then (LIMO: “Less Is MOre”...)

And data is increasingly synthetic or synthetically verified: