Chatting with an AI… - Ep4
LLMs evolution and Deepseek
1
This work is licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 4.0 International License
Objective
Trying to understand�and to systematize�what has happened�and is happening�in a cultural perspective…
… mainly assuming�the ideas proposed�in Ep1, Ep2, and Ep3
Ep1 (in Italian) https://docs.google.com/presentation/d/1jkJBAHgOKglKJ-8XjxnWFjcPKt_J95nJIJ8Sa-kMumk
Ep2 (in English) https://docs.google.com/presentation/d/1g2FpgfEJ01HSdtKH5nS1flKkAM9gu5oFxq-WtT9hNwQ
Ep3 (slides in English) https://docs.google.com/presentation/d/1KEs_Twaue7UOxXD77K5unecRYheLTUFkN4IT_3s7dn8
2
Just yesterday: https://awards.acm.org/about/2024-turing
Language Models and Reasoning Models
Q: Se Alice ha due fratelli e tre sorelle, quante sorelle ha ogni fratello di Alice?
(conversations held in March 2025)
A: (ChatGPT 4o) Alice ha due fratelli e tre sorelle. Ogni fratello di Alice ha gli stessi fratelli e sorelle di Alice, tranne se stesso. Quindi, ogni fratello ha un fratello (l'altro fratello di Alice) e tre sorelle (Alice e le altre due sorelle). Ogni fratello di Alice ha tre sorelle.
A: (ChatGPT 4.5) Ogni fratello di Alice ha quattro sorelle.
Ragionamento: Alice ha due fratelli e tre sorelle. Le sorelle sono dunque Alice + altre 3 sorelle, per un totale di 4 sorelle. Ogni fratello di Alice avrà quindi tutte e 4 queste sorelle (Alice inclusa).
Quindi, la risposta è: 4 sorelle
A: (o1) <reasoning> Clarifying the scenario: I'm piecing together information to determine that Alice has two brothers and three sisters, which totals six siblings. Revisiting Alice's family: Alice has two brothers and three sisters, totaling six siblings, including herself. The phrase "due fratelli e tre sorelle" clarifies this.
<answer> Se Alice ha due fratelli e tre sorelle, in totale ci sono quindi 6 figli in famiglia: Alice stessa, altri 2 fratelli e 3 sorelle. Per ciascuno dei fratelli, le sorelle sono tutte le figlie femmine della famiglia tranne se stesso (ovviamente). Dunque: Le sorelle sono Alice e le altre 3 sorelle (4 femmine in tutto). Di conseguenza, ogni fratello di Alice ha 4 sorelle.
Behind the curtain
Almost no technical details have been revealed about the structure and the training of o1:
But then…
Also due to this,�some implicit messages spread out:
What DeepSeek did
“It thinks before it answers”: an (uncensored) example
Question: Vuoi propormi un titolo creativo e arguto, ma formalmente corretto, per una lezione a proposito di "modelli di ragionamento", come tu sei?
Thoughts: Okay, so I need to come up with a creative and witty title for a lesson about reasoning models. Let me think about synonyms for reasoning models. They could be thought of as frameworks, structures, blueprints, or even tools for thinking. I want the title to sound both professional and intriguing. Maybe something like "Navigando i Modelli di Ragionamento" which means "Navigating Reasoning Models." That gives a sense of journey or exploration. Or perhaps "Modelli di Ragionamento: Chiavi per il Pensiero Critico," meaning "Reasoning Models: Keys to Critical Thinking." That emphasizes the unlocking aspect.�Wait, maybe using "Strumenti del Ragionamento" could work too: "tools of reasoning". So something like "Strumenti del Ragionamento: Architettura della Logica" translating to "Tools of Reasoning: Architecture of Logic," which sounds technical yet creative.�I should also consider if the title is too complex or not clear enough. It needs to be formal but still catchy. Maybe combining words that evoke both structure and creativity, like "Modelli di Ragionamento: Disegnare il Vostro Pensiero." That means "Reasoning Models: Design Your Thinking," which suggests a creative process.�Another thought is using action verbs to make it dynamic, like "Decodificando i Modelli di Ragionamento" or "Decoding Reasoning Models." It gives the sense of uncovering something valuable. Or perhaps "Modelli di Ragionamento: La Struttura del Pensiero Efficace," meaning "Reasoning Models: The Structure of Effective Thinking."�I should also think about the audience—probably students or professionals looking to enhance their reasoning skills. So, the title should resonate with them, offering both learning and practical application. Maybe combining a metaphor with structure: "Modelli di Ragionamento: Il ponte tra idea e realtà" meaning "Reasoning Models: The bridge between idea and reality." That suggests that reasoning models help connect thoughts to actionable outcomes. Alternatively, something like "Modelli di Ragionamento: Costruire il Vostro Sistema Logico," which translates to "Reasoning Models: Build Your Logical System." It emphasizes construction and building, which is proactive.�I think I've got a few options now. Each has its own angle—some focus on structure, others on action or metaphor. The best might be one that's both clear and evokes the right emotions about reasoning models.
Answer: Modelli di Ragionamento: Il Ponte Tra Idea e Realtà�Questo titolo combina elegantemente chiarezza con un tocco metaforico, suggerendo che i modelli di ragionamento servono da ponte che collega idee astratte a risultati pratici. Invita gli studenti a immaginare il loro percorso dalla concettualizzazione all'implementazione, sottolineando sia gli aspetti strutturali che quelli trasformativi dei modelli di ragionamento.
From Language Models to Reasoning Models
The prompt used in training by DeepSeek:
A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e.,
<think> reasoning process here </think> <answer> answer here </answer>
(a very short introduction to reinforcement learning)
(are you aware of https://github.com/openai/gym, first (openly…) released in 2016?)
But of course one prompt is not sufficient…
Learning / training strategies
Imitation / supervised learning�Given a request whose correct answer is known:�
1. the request is submitted to the chatbot
2. the chatbot produces a response
3. the response is matched with the correct one
4. if there is a difference, the behavior is adjusted
and the process is repeated with other requests
Requirements:
– a set of pairs (request, correct response)�– a method for matching the responses�– a method for adjusting the behavior
Trial-and-error / reinforcement learning�Given a request whose possible answers�have a quality that can be evaluated:
1. the request is submitted to the chatbot
2. the chatbot produces a response
3. the quality of the response is evaluated
4. the behavior is adjusted to improve the quality
and the process is repeated with other requests
Requirements:
– a method for evaluating the quality� of the responses to the requests� (e.g., a “reward” function f(request, response))�– a method for adjusting the behavior
A matter of learning / training, as usual…
“There are two major types of learning, in both children and in deep learning. There is 1) imitation learning (watch and repeat, i.e. pretraining, supervised fine tuning), and 2) trial-and-error learning (reinforcement learning). [...] Almost every single shocking result of deep learning, and the source of all *magic* is always 2. 2 is significantly significantly more powerful. 2 is what surprises you. [...] 2 is the ‘aha moment’ when the DeepSeek (or o1 etc.) discovers that it works well to re-evaluate your assumptions, backtrack, try something else, etc. It’s the solving strategies you see this model use[s] in its chain of thought. It’s how it goes back and forth thinking to itself. These thoughts are *emergent* (!!!) and this is actually seriously incredible, impressive and new (as in publicly available and documented etc.). The model could never learn this with 1 (by imitation), because the cognition of the model and the cognition of the human labeler is different. The human would never know to correctly annotate these kinds of solving strategies and what they should even look like. They have to be discovered during reinforcement learning as empirically and statistically useful towards a final outcome.”
Three layers?
1. Programmed machines
2. Learning machines trained by imitation
3. Learning machines trained by trial-and-error
What is happening?
12 Sep 24: OpenAI o1
28 Nov 24: Qwen QwQ-32B preview
21 Jan 25: DeepSeek-R1
31 Jan 25: OpenAI o3-mini
31 Jan 25: Simplescaling s1-32B (ArXiv)
2 Feb 25: OpenAI deep research
5 Feb 25: Google Gemini 2.0 Flash Thinking
12 Feb 25: Open thoughts OpenThinker-32B (announcement)
27 Feb 25: OpenAI GPT-4.5
5 Mar 25: Qwen QwQ-32B
Evaluating the quality of “reasoning”: an example
The American Invitational Mathematics Examination (AIME) is … used to challenge�bright students on the path toward choosing the team that represents the United States at the International Mathematics Olympiad. … Most AIME participants are high school students.�… The AIME is a 15 question, 3 hour exam.�… Each answer is an integer from 000 to 999, inclusive, making guessing almost futile.�… No aids other than scratch paper, rulers and compasses are permitted.�(https://artofproblemsolving.com/wiki/index.php/American_Invitational_Mathematics_Examination)
An example (AIME 2025 I, problem 1):�Find the sum of all integer bases b>9 for which 17b is a divisor of 97b.
Some reflections: openness
A significant part of the development around GenAI is publicly accessible,�in particular on ArXiv, GitHub, and Hugging Face
This has made the dissemination of information practically instantaneous:�it took only a few days from the publication of the first open reasoning model�for other models to be developed, not only in the United States
Some reflections: accessibility and standardization
The (hardware and software) barriers to entry have been lowered
All RMs are in fact the result of the same kind of process:�a pretrained transformer is taken and further trained with some sort of CoT
Pretrained transformers are now commodities�(e.g., the RMs from Simple Scaling, GAIR, and Open Thoughts, are all based on Qwen 2.5-32B)
It is as if pre-trained transformers have become universal enablers:� “give me a pretrained transformer and I will build a language model� with increasingly better capabilities”
Some reflections: targets and role of data
“Reasoning” capabilities are the result of increasingly sophisticated training,�and to this purpose the quality of data is not less important than their quantity
(DeepSeek-R1-distill: 800k; OpenThinker: 100k; s1 and LIMO: 1k reasoning traces)
“While conventional wisdom suggests that sophisticated reasoning tasks demand extensive training data�(>100,000 examples), we demonstrate that complex mathematical reasoning abilities can be effectively�elicited with surprisingly few examples.” (https://arxiv.org/abs/2502.03387)
“Small is beautiful” then (LIMO: “Less Is MOre”...)
And data is increasingly synthetic or synthetically verified:
Some reflections: value of data
Open Thoughts: “We are a team of researchers and engineers … united
around building the best datasets
(and thus the best models).”
Thanks for your attention!
Luca Mari
3 Feb 2025, DeepSeek-R1: il primo modello di ragionamento aperto e autonomo
5 Mar 2025, Modelli di ragionamento, ecco le tre nuove caratteristiche dell'intelligenza artificiale (più una)
21