Customized Knowledge for AI
Terms of Use
Except where otherwise indicated, The contents of this slide presentation are available for use under the Creative Commons Attribution 4.0 license.
You are free to adapt and share the work, but you must give appropriate credit, provide a link to the license, and indicate if changes were made.
Sample attribution: [Title of work] by Fred Hutchinson Cancer Center Data Science Lab. CC-BY 4.0
Don’t ask your primary care physician about how to fix your motorbike
Neither should you depend on an AI model for something it isn’t trained for
?
CC-by hutchdatascience.org
Input
Output
Training Data
What is this?
It’s an apple.
Algorithms
What’s your use case?
Customized Knowledge
My input questions are domain specific.
CC-by hutchdatascience.org
Prompt engineering
Prompt-tuning
Fine-tuning a model
Training from Scratch
Make a whole new ChatGPT; this is prohibitively expensive
Make the model better - but with a bit more investment
Make the model better - but efficiently
Highest
Investment
Lowest
Investment
Sources: https://hbr.org/2023/07/how-to-train-generative-ai-using-your-companys-data
The user of the model asks better questions
Getting a better output from a AI model
CC-by hutchdatascience.org
Sometimes it's not the model who needs training
It's the user
Prompt Engineering
Sources: Screenshot from chatGPT
CC-by hutchdatascience.org
1. Know the model’s strengths and weaknesses
2. Be as specific as possible
3. Utilize contextual prompts
4. Provide AI models with examples
5. Experiment with prompts and personas
6. Try chain-of-thought prompting
Source: https://cloud.google.com/blog/products/application-development/five-best-practices-for-prompt-engineering
Best practices for prompt engineering
CC-by hutchdatascience.org
https://poe.com/ and https://gpt.h2o.ai/
Their differing training and algorithms will lead to differing results
Test out prompts on multiple AI platforms
these websites have multiple competitors in one place
CC-by hutchdatascience.org
Source: https://research.ibm.com/blog/what-is-ai-prompt-tuning
Prompt tuning or “P-tuning”
CC-by hutchdatascience.org
A type or prompt tuning offered by https://gpt.h2o.ai/ – go to the Expert tab
Test out prompt tuning
CC-by hutchdatascience.org
Pictures from OpenMoji.org
Training a baby for a specialized education job would be costly and inefficient
Instead, you find a person who has a lot of the training you need and then fine-tune their skills.
<
Training from scratch vs. fine tuning
CC-by hutchdatascience.org
Pictures from OpenMoji.org
Source: https://www.wired.com/story/openai-ceo-sam-altman-the-age-of-giant-ai-models-is-already-over/
ChatGPT cost ~$100 million to create
Training models from scratch requires an insane amount of data and computing costs
It’s almost never where you will want to start.
Training from scratch
CC-by hutchdatascience.org
Before exploring fine tuning:
Are you sure no other model works?
(If you’ve only tried chatGPT go try other AI platforms)
https://www.nature.com/articles/d41586-023-03023-4
CC-by hutchdatascience.org
https://huggingface.co/blog/large-language-models
Finding a base open source base model to train for your purposes:
Fine tuning
train an existing model to do the job better
CC-by hutchdatascience.org
https://s10251.pcdn.co/wp-content/uploads/2023/03/2023-Alan-D-Thompson-AI-Bubbles-Rev-7b.png
CC-by hutchdatascience.org
CC-by hutchdatascience.org
*More on these items in future chapters
Fine tuning
train an existing model to do the job better
CC-by hutchdatascience.org
*More on this in the next chapter
| What is needed | P-tuning Docs | Fine Tuning Docs | For use with protected data? |
| No | |||
Can be used from GUI or command line Data cleaning likely needed | Not from the GUI but the command line could if built right | |||
Python needed Data cleaning likely needed | If you build it right* |
Example starting points for Fine or P-tuning
CC-by hutchdatascience.org
You are here
Ongoing monitoring throughout the project
CC-by hutchdatascience.org