1 of 34

AI language models

as a teaching tool

Sebastiaan Mathôt | cogsci.nl/smathot | s.mathot@rug.nl | leiden university | june 20 2024

2 of 34

Our mission for the next 90 min

To learn

  • How large language models (LLMs) work
  • How to build our own LLM-based tools for education

I will argue that it is important to build our own LLM-based education tools so that we can use this technology on our own terms

And tell you about our efforts in Groningen to make this happen (+ open invitation to join forces)

3 of 34

Introduction to large language models

According to GPT4, this is how it functions on the inside

4 of 34

What is a large language model?

Software that takes a sequence of words (“tokens”) and predicts the next word

  • “Software that takes” → “a”

This prediction is added to the input (“autoregression”), and then the next word is predicted, and so forth

  • “Software that takes a” → “sequence”

Using very large “transformer” neural networks

  • 10 - 1000 billion parameters
  • Quality depends on size and training

5 of 34

Tokens and embeddings

Text is segmented into meaningful chunks, or “tokens”

  • Similar to morphemes

Each token is represented as an “embedding”

  • Hundreds of continuous values
  • Conceptually related tokens have similar embeddings
  • Embeddings are (usually) learned

6 of 34

Tokens and embeddings

  • Embeddings are not only used as input for LLMs!
  • Embeddings contain semantic information and can therefore be used for fuzzy “vector” search
    • Searching for “awareness” also matches text containing “consciousness”
  • Use case: retrieving sections from a textbook that match a question
    • We will see an example of this later in Heymans

7 of 34

Positional encoding

Tokens are processed independently

  • The serial position of tokens is lost!

To overcome this, an embedding that represents position is added to each token embedding

  • ‘Cat’ is represented differently in ‘The cat’ and ‘Cat food’
  • Positional-encoding embeddings are also learned

8 of 34

Everybody thinks they know what attention is

“Attention layers” consisting of multiple “attention heads” lie at the heart of the transformer architecture

  • Somewhat similar to convolutional layers and filters in image-recognition networks

Attention heads recode tokens as weighted combinations of all tokens, including itself

  • Attention refers to the weights
  • Which are again learned

9 of 34

Layers and attention

This is done over and over again in hundreds of layers consisting of hundreds of heads

  • Separately for each token
  • Purely feedforward

Tokens in later layers represent increasingly complex relationships between the input tokens

The final encoding of the last token is the prediction of the next token!

Prediction is deterministic (randomness is added post-hoc)

10 of 34

LLM training: Step 1

First step: text prediction

  • Take billions of texts from the internet
  • Use a text fragment (“dogs bark”) as input
  • Mask the last word (“dogs …”)
  • Predict the next word (“meow”)
  • Use backpropagation (a training algorithm) to update the network to make better predictions (“bark”)

This works but predictions tend complete rather than reply

  • Prompt: “Provide a psychological assessment of Kanye West.”
  • Prediction: “Follow APA6 diagnostic criteria.”

11 of 34

LLM training: Step 2

Second step: reinforcement learning

  • Have different predictions rated by humans
  • The network is updated to give the highest-rated prediction

This makes the predictions more conversational

12 of 34

LLM knowledge

LLM knowledge is like long-term (semantic) memory

  • Boundaries are fuzzy
  • Can lead to confabulations or “hallucinations”

LLM “prompts” are like working memory

  • Relevant information can be inserted into the conversation
  • Possibly hidden from the user
  • Vastly reduces hallucinations (in advanced models)
  • We will again see an example of this in Heymans

13 of 34

By turning this part of the conversation into a black box you get an AI tutor that has read the textbook

14 of 34

By turning this part of the conversation into a black box

you get an AI tutor that has read the textbook

15 of 34

From word prediction to conceptual understanding

Do LLMs understand things or do they simply make statistical predictions (“stochastic parrots”)?

  • And what’s the difference?

Predictions at different levels of abstraction

  • Basic word associations
    • Dogs … → bark
  • Higher-level word associations and parallelism
    • Dogs bark, and … … → cats meow
  • Higher-level goals
    • Translate “Dogs bark, and cats meow” to Chinese. … … … … → 狗叫,猫喵

16 of 34

Different ways of using LLMs

Not just chat.openai.com

17 of 34

Ways of using LLMs

Web interface for chatting

  • chat.openai.com
  • chat.mistral.ai
  • gemini.google.com
  • sigmundai.eu (our own chatbot)

Programmatically (API access)

  • For integration with custom software

Different ways of using the same technology

18 of 34

Some available options

Proprietary

  • ChatGPT / OpenAI (US)
  • Claude / Anthropic (US)
  • Gemini / Google (US)

Top proprietary models are state of the art

Open-source models are less capable, but rapidly becoming realistic alternatives

(Partly) open source

  • Mistral (France)­

Largely open, but top model is closed

  • LLama / Meta (France)

Main driver of open-source AI

  • SigmundAI (NL, ours)

Not a model, but open-source chat interface to various models

  • HuggingFace (France)

Not a model, but hub for open-source AI

19 of 34

An example use case: Heymans

AI tutor for Introduction to Psychology

20 of 34

Heymans pilot

Prototype AI tutor for formative assessment

  • https://heymans.cogsci.nl

Used in Introduction to Psychology/ Overzicht van de Psychology (2023-2024)

  • One open question per chapter
  • Automated pass/fail grading
  • Required for access to exam

21 of 34

Heymans pilot: architecture

Two modes

  • Practice: Heymans asks questions and provides feedback until student answers correctly
  • Q&A: Heymans answers questions based on information found in textbook through vector search (remember embeddings?)

Custom software written by us and running on our own server

Uses GPT4 through OpenAI API

Student

Heymans server

Data storage

OpenAI server

No data storage

Textbook

Chat through web app

Get text from textbook

API access

22 of 34

Heymans pilot: practice prompt

You are a friendly tutor for an introductory psychology course. Your name is Heymans. You are about to chat with a student named {{ name }} about the excerpt from a textbook below. The student is a beginner, so keep questions and feedback simple.

<textbook>

{{ source }}

</textbook>

The chat session is structured as follows:

- Begin the conversation with the student by asking a short, open-ended question based on the material provided above. Indicate which section is the basis for the question.

- Evaluate the student's response to determine if it sufficiently demonstrates understanding of the concept(s).

- If the response does not connect to the question, remind the student that the assignment should be taken seriously.

- If the response resembles the textbook or your own feedback, remind the student to use his or her own words.

- If the response is satisfactory, conclude the teaching session. Do not offer to continue the conversation. End your response with <FINISHED>.

- If it the response is not satisfactory, provide constructive feedback and suggestions for improvement.

- After providing feedback, allow the student to respond with an improved answer. Continue this feedback cycle until the answer demonstrates a satisfactory understanding of the concept(s).

Remember to keep the questions simple and concrete.

23 of 34

Heymans pilot: student engagement

24 of 34

Heymans pilot: cost

API calls cost money

At the time of this pilot

  • Around €1900 for both courses together
  • Around €0.10 per engagement

Right now already much cheaper

  • Prices have gone down by a factor of ±6
  • Room for optimization in software

But costs will remain substantial

  • And are not invested in our community! (For now at least—we’ll get to a roadmap later)

25 of 34

Other use cases

Ask not what you can do for your LLM. Ask what your LLM can do for you.

26 of 34

What are LLMs good for?

Demanding linguistic tasks that

  • Are time consuming
  • Are unpleasant to do
  • And do not provide meaningful interaction with students

So that we free up time for meaningful activities

27 of 34

What are LLMs good for?

Such as

  • Scoring open-ended exam questions
  • Interactive quizzes (Heymans-style)
  • Creating multiple-choice questions
  • Validating multiple-choice questions
  • Knowledge bases (textbooks, university regulations, etc.)

But probably not (at this point)

  • Grading or giving feedback on essays or theses
  • Communication with students

28 of 34

Roadmap

Towards using LLMs as a teaching tool on our own terms

29 of 34

An opinionated prediction

AI-based teaching products will soon be offered by every academic publisher and big-tech company

These products will

  • Reflect norms and values that we may not share
  • Be expensive
  • Be hard to get rid of once we buy into them

Therefore we can and should find a middle ground between developing our own products and using commercial tools

  • We should not underestimate the resources and expertise at universities. The main barriers are organizational.

30 of 34

Organizational challenges

We should not underestimate the resources and expertise at universities

  • Consider: Mistral leads open-source AI with a team of <20
  • This corresponds to a fraction of university resources

The main barriers at universities are organizational

  • Extreme risk aversion
  • A culture of meetings and committees
  • Lack of coordinated action

If we as universities overcome these barriers, we can accomplish a lot

31 of 34

Roadmap

In Groningen, we have formed a small team to implement AI teaching tools, building on the Heymans prototype

Initial focus

  • Grading of open-ended exams
  • Formative quizzes (as in Intro Psych)
  • Knowledge base
  • Integration with learning environments (we target Brightspace, but use a protocol that also works with Canvas)

Everything will be open source

Contributions are welcome (especially development resources)

32 of 34

33 of 34

More ambitious projects

Heymans will have to use a commercial API

  • OpenAI (GPT4), Anthropic (Claude 3), or Mistral Large

In the longer term, can we run our own model on the Groningen Hábrok cluster?

  • Technically challenging
  • What would we gain?
  • Are open models sufficiently capable?

34 of 34

And remember …

There is nothing magical about AI language models

Thank you!

Sebastiaan Mathôt | cogsci.nl/smathot | s.mathot@rug.nl | leiden university | june 20 2024