ChatGPT
Jun Ji
2023.02.10.
Ex : Coding (https://chat.openai.com/auth/login)
Ex : Math (https://chat.openai.com/auth/login)
Q: Arun is 2 years older than Bharat who is twice as old as Charat. If the total of the ages of Arun, Bharat and Charat be 27, then how old is Bharat? (Arun은 Charat보다 두 배 나이가 많은 Bharat보다 2 살 더 많습니다. Arun, Bharat 및 Charat의 총 연령이 27이라면 Bharat는 몇 살입니까?)
Ex : Writing & Summary (https://chat.openai.com/auth/login)
Q : Please logically describe the need for ai education in college
Q : Summarize it as a slide for your presentation.
In conclusion, university AI education is necessary to prepare students for the rapidly changing technological landscape and equip them with the knowledge and skills necessary to tackle complex problems and make positive contributions to society.
ChatGPT
GPT
GPT - unsupervised pre-training
GPT - masked multi-head attention
Step 1 - Supervised Fine Tuning (SFT) Model
The compilation of prompts from the OpenAI API and hand-written by labelers resulted in 13,000 input / output samples to leverage for the supervised model.
The GPT-3 model was then fine-tuned using this new, supervised dataset, to create GPT-3.5, also called the SFT model.
Step 2 : Reward Model
To train the reward model, labelers are presented with 4 to 9 SFT model outputs for a single input prompt. They are asked to rank these outputs from best to worst, creating combinations of output ranking as follows.
Step 3 -
Reinforcement Learning
Model
In the final stage, the model is presented with a random prompt and returns a response. The response is generated using the ‘policy’ that the model has learned in step 1. Based on the reward model developed in step 2, a scaler reward value is then determined for the prompt and response pair. The reward then feeds back into the model to evolve the policy.
Step 3 - Reinforcement Learning Model