2 of 34

Our mission for the next 90 min

To learn

How large language models (LLMs) work
How to build our own LLM-based tools for education

I will argue that it is important to build our own LLM-based education tools so that we can use this technology on our own terms

And tell you about our efforts in Groningen to make this happen (+ open invitation to join forces)

3 of 34

Introduction to large language models

According to GPT4, this is how it functions on the inside

4 of 34

What is a large language model?

Software that takes a sequence of words (“tokens”) and predicts the next word

“Software that takes” → “a”

This prediction is added to the input (“autoregression”), and then the next word is predicted, and so forth

“Software that takes a” → “sequence”

Using very large “transformer” neural networks

10 - 1000 billion parameters
Quality depends on size and training

5 of 34

Tokens and embeddings

Text is segmented into meaningful chunks, or “tokens”

Similar to morphemes

Each token is represented as an “embedding”

Hundreds of continuous values
Conceptually related tokens have similar embeddings
Embeddings are (usually) learned

6 of 34

Tokens and embeddings

Embeddings are not only used as input for LLMs!
Embeddings contain semantic information and can therefore be used for fuzzy “vector” search

Searching for “awareness” also matches text containing “consciousness”

Use case: retrieving sections from a textbook that match a question

We will see an example of this later in Heymans

7 of 34

Positional encoding

Tokens are processed independently

The serial position of tokens is lost!

To overcome this, an embedding that represents position is added to each token embedding

‘Cat’ is represented differently in ‘The cat’ and ‘Cat food’
Positional-encoding embeddings are also learned

8 of 34

Everybody thinks they know what attention is

“Attention layers” consisting of multiple “attention heads” lie at the heart of the transformer architecture

Somewhat similar to convolutional layers and filters in image-recognition networks

Attention heads recode tokens as weighted combinations of all tokens, including itself

Attention refers to the weights
Which are again learned

9 of 34

Layers and attention

This is done over and over again in hundreds of layers consisting of hundreds of heads

Separately for each token
Purely feedforward

Tokens in later layers represent increasingly complex relationships between the input tokens

The final encoding of the last token is the prediction of the next token!

Prediction is deterministic (randomness is added post-hoc)

10 of 34

LLM training: Step 1

First step: text prediction

Take billions of texts from the internet
Use a text fragment (“dogs bark”) as input
Mask the last word (“dogs …”)
Predict the next word (“meow”)
Use backpropagation (a training algorithm) to update the network to make better predictions (“bark”)

This works but predictions tend complete rather than reply

Prompt: “Provide a psychological assessment of Kanye West.”
Prediction: “Follow APA6 diagnostic criteria.”

11 of 34

LLM training: Step 2

Second step: reinforcement learning

Have different predictions rated by humans
The network is updated to give the highest-rated prediction

This makes the predictions more conversational

12 of 34

LLM knowledge

LLM knowledge is like long-term (semantic) memory

Boundaries are fuzzy
Can lead to confabulations or “hallucinations”

LLM “prompts” are like working memory

Relevant information can be inserted into the conversation
Possibly hidden from the user
Vastly reduces hallucinations (in advanced models)
We will again see an example of this in Heymans

13 of 34

By turning this part of the conversation into a black box you get an AI tutor that has read the textbook

14 of 34

By turning this part of the conversation into a black box

you get an AI tutor that has read the textbook

15 of 34

From word prediction to conceptual understanding

Do LLMs understand things or do they simply make statistical predictions (“stochastic parrots”)?

And what’s the difference?

Predictions at different levels of abstraction

Basic word associations

Dogs … → bark

Higher-level word associations and parallelism

Dogs bark, and … … → cats meow

Higher-level goals

Translate “Dogs bark, and cats meow” to Chinese. … … … … → 狗叫，猫喵

16 of 34

Different ways of using LLMs

Not just chat.openai.com

17 of 34

Ways of using LLMs

Web interface for chatting

chat.openai.com
chat.mistral.ai
gemini.google.com
sigmundai.eu (our own chatbot)

Programmatically (API access)

For integration with custom software

Different ways of using the same technology

18 of 34

Some available options

Proprietary

ChatGPT / OpenAI (US)
Claude / Anthropic (US)
Gemini / Google (US)

Top proprietary models are state of the art

Open-source models are less capable, but rapidly becoming realistic alternatives

(Partly) open source

Mistral (France)

Largely open, but top model is closed

LLama / Meta (France)

Main driver of open-source AI

SigmundAI (NL, ours)

Not a model, but open-source chat interface to various models

HuggingFace (France)

Not a model, but hub for open-source AI

19 of 34

An example use case: Heymans

AI tutor for Introduction to Psychology

20 of 34

Heymans pilot

Prototype AI tutor for formative assessment

https://heymans.cogsci.nl

Used in Introduction to Psychology/ Overzicht van de Psychology (2023-2024)

One open question per chapter
Automated pass/fail grading
Required for access to exam

21 of 34

Heymans pilot: architecture

Two modes

Practice: Heymans asks questions and provides feedback until student answers correctly
Q&A: Heymans answers questions based on information found in textbook through vector search (remember embeddings?)

Custom software written by us and running on our own server

Uses GPT4 through OpenAI API

Student

Heymans server

Data storage

OpenAI server

No data storage

Textbook

Chat through web app

Get text from textbook

API access

22 of 34

Heymans pilot: practice prompt

You are a friendly tutor for an introductory psychology course. Your name is Heymans. You are about to chat with a student named {{ name }} about the excerpt from a textbook below. The student is a beginner, so keep questions and feedback simple.

</textbook>

The chat session is structured as follows:

- Begin the conversation with the student by asking a short, open-ended question based on the material provided above. Indicate which section is the basis for the question.

- Evaluate the student's response to determine if it sufficiently demonstrates understanding of the concept(s).

- If the response does not connect to the question, remind the student that the assignment should be taken seriously.

- If the response resembles the textbook or your own feedback, remind the student to use his or her own words.

- If the response is satisfactory, conclude the teaching session. Do not offer to continue the conversation. End your response with <FINISHED>.

- If it the response is not satisfactory, provide constructive feedback and suggestions for improvement.

- After providing feedback, allow the student to respond with an improved answer. Continue this feedback cycle until the answer demonstrates a satisfactory understanding of the concept(s).

Remember to keep the questions simple and concrete.

23 of 34

Heymans pilot: student engagement

24 of 34

Heymans pilot: cost

API calls cost money

At the time of this pilot

Around €1900 for both courses together
Around €0.10 per engagement

Right now already much cheaper

Prices have gone down by a factor of ±6
Room for optimization in software

But costs will remain substantial

And are not invested in our community! (For now at least—we’ll get to a roadmap later)

25 of 34

Other use cases

Ask not what you can do for your LLM. Ask what your LLM can do for you.

26 of 34

What are LLMs good for?

Demanding linguistic tasks that

Are time consuming
Are unpleasant to do
And do not provide meaningful interaction with students

So that we free up time for meaningful activities

27 of 34

What are LLMs good for?

Such as

Scoring open-ended exam questions
Interactive quizzes (Heymans-style)
Creating multiple-choice questions
Validating multiple-choice questions
Knowledge bases (textbooks, university regulations, etc.)

But probably not (at this point)

Grading or giving feedback on essays or theses
Communication with students

28 of 34

Roadmap

Towards using LLMs as a teaching tool on our own terms

29 of 34

An opinionated prediction

AI-based teaching products will soon be offered by every academic publisher and big-tech company

These products will

Reflect norms and values that we may not share
Be expensive
Be hard to get rid of once we buy into them

Therefore we can and should find a middle ground between developing our own products and using commercial tools

We should not underestimate the resources and expertise at universities. The main barriers are organizational.

30 of 34

Organizational challenges

We should not underestimate the resources and expertise at universities

Consider: Mistral leads open-source AI with a team of <20
This corresponds to a fraction of university resources

The main barriers at universities are organizational

Extreme risk aversion
A culture of meetings and committees
Lack of coordinated action

If we as universities overcome these barriers, we can accomplish a lot

31 of 34

Roadmap

In Groningen, we have formed a small team to implement AI teaching tools, building on the Heymans prototype

Initial focus

Grading of open-ended exams
Formative quizzes (as in Intro Psych)
Knowledge base
Integration with learning environments (we target Brightspace, but use a protocol that also works with Canvas)

Everything will be open source

Contributions are welcome (especially development resources)

33 of 34

More ambitious projects

Heymans will have to use a commercial API

OpenAI (GPT4), Anthropic (Claude 3), or Mistral Large

In the longer term, can we run our own model on the Groningen Hábrok cluster?

Technically challenging
What would we gain?
Are open models sufficiently capable?

34 of 34

And remember …

There is nothing magical about AI language models

Thank you!

Sebastiaan Mathôt | cogsci.nl/smathot | s.mathot@rug.nl | leiden university | june 20 2024

1 of 34