1 of 18

State-of-the-art Conversational AI

Ricardo Rei

2 of 18

01

Generative Pretrained Transformer

3 of 18

3

Generative Pretrained Transformer (GPT)

In 2018 and 2019, Alec Radford, Jeffrey Wu and their co-workers proposed two language models trained on a very large amount of data: GPT and GPT-2. Latter at ACL2020 Zhang et al., 2019 presented DialoGPT, a GPT model trained on large amount of conversational data from reddit.

4 of 18

4

Generative Pretrained Transformer (GPT)

  • These models have genetic knowledge.
  • They “know” how to write and they understand language.
  • Some of them (DialoGPT) even ” know ” how to have a random conversation.

We just have to teach them new skills! for example how to impersonate a character

5 of 18

5

Generative Pretrained Transformer (GPT)

6 of 18

6

Generative Pretrained Transformer (GPT)

Taking into account that the model was used to see continuous text (such as wikipedia articles etc..) the model knew nothing about the differences between history, persona and reply. In order to help the model understand that these segments have different meanings we introduce 2 things:

  • delimiter tokens
  • segment embeddings

7 of 18

7

Generative Pretrained Transformer (GPT)

  • Example:
  • Persona: I like playing football. I am from NYC
  • History: Hello, how are you?
  • Reply: I am fine, I just watched an amazing football game!

[BOS] I like playing football. I am from NYC [SPEAKER1] Hello, how are you? [SPEAKER2] I am fine, I just watched an amazing football game! [EOS]

8 of 18

8

Generative Pretrained Transformer (GPT)

9 of 18

02

Combining Generation with Retrieval

10 of 18

10

Combining Generation with Retrieval

In order to help the model understand what makes a good reply we have to teach the model what is a bad reply!

To do that we will let the model look a possible answer and, using the last hidden state, let the model decide if that answer is appropriated or not.

11 of 18

11

Combining Generation with Retrieval

12 of 18

12

Combining Generation with Retrieval

  • Example:
  • Persona: I like playing football. I am from NYC
  • History: Hello, how are you?
  • Reply: I am fine, what about you?
  • Distractor: oh I am sorry to hear that.

After tokenization our batch will look like this:

On top of the autoregressive training we take the last token and we produce a score for each reply. Then we maximize the score of the correct answer!

13 of 18

03

Inference

14 of 18

14

Inference

After training the model we can do two things:

  • Ranks possible candidate answers
  • Decode new answers.

Decoding strategies:

I’ll not going to talk about possible decoding strategies but I recommend you to read the following blog-post: https://huggingface.co/blog/how-to-generate

All this strategies can be tested in our code: https://github.com/HLT-MAIA/lightning-convai

15 of 18

15

Inference

Ranking candidates:

Ranking candidates follows the same strategy described above, we give the model a batch such as this one:

For each candidate we take the last hidden state and we produce a score for each candidate. Then we only have to sort them by score and pick the candidate with the highest score.

16 of 18

04

Demo

17 of 18

18 of 18