1 of 19

ChatGPT

Jun Ji

2023.02.10.

References

2 of 19

Ex : Coding (https://chat.openai.com/auth/login)

Q : plot a linear regression

with Python using matplotlib

A: https://colab.research.google.com/

3 of 19

Ex : Math (https://chat.openai.com/auth/login)

Q: Arun is 2 years older than Bharat who is twice as old as Charat. If the total of the ages of Arun, Bharat and Charat be 27, then how old is Bharat? (Arun은 Charat보다 두 배 나이가 많은 Bharat보다 2 살 더 많습니다. Arun, Bharat 및 Charat의 총 연령이 27이라면 Bharat는 몇 살입니까?)

4 of 19

Ex : Writing & Summary (https://chat.openai.com/auth/login)

Q : Please logically describe the need for ai education in college

Q : Summarize it as a slide for your presentation.

Rapid advancement in AI technology: AI is making a big impact in various industries and sectors.
High demand for AI professionals: AI is making a big impact in various industries and sectors.
Problem-solving potential: AI education equips students with the necessary knowledge, skills, and tools to succeed in the field.
Contributing to society: By providing students with AI skills, universities can help them solve challenges in areas like health, transportation, energy, and finance.
Preparing for the future: AI is shaping the future of work. Universities can prepare students to succeed in this rapidly changing technological landscape.

In conclusion, university AI education is necessary to prepare students for the rapidly changing technological landscape and equip them with the knowledge and skills necessary to tackle complex problems and make positive contributions to society.

5 of 19

ChatGPT

6 of 19

7 of 19

GPT

Language Models are Few-Shot Learners, �Brown et al. (2020)
Transformer Decoder 로 구현
Generative Pre-Trained Transformer

8 of 19

GPT - unsupervised pre-training

9 of 19

GPT - masked multi-head attention

10 of 19

Step 1 - Supervised Fine Tuning (SFT) Model

The compilation of prompts from the OpenAI API and hand-written by labelers resulted in 13,000 input / output samples to leverage for the supervised model.

The GPT-3 model was then fine-tuned using this new, supervised dataset, to create GPT-3.5, also called the SFT model.

11 of 19

Step 2 : Reward Model

To train the reward model, labelers are presented with 4 to 9 SFT model outputs for a single input prompt. They are asked to rank these outputs from best to worst, creating combinations of output ranking as follows.

12 of 19

13 of 19

14 of 19

15 of 19

16 of 19

17 of 19

18 of 19

Step 3 -

Reinforcement Learning

Model

In the final stage, the model is presented with a random prompt and returns a response. The response is generated using the ‘policy’ that the model has learned in step 1. Based on the reward model developed in step 2, a scaler reward value is then determined for the prompt and response pair. The reward then feeds back into the model to evolve the policy.

19 of 19

Step 3 - Reinforcement Learning Model