1 of 19

Customized Knowledge for AI

2 of 19

Terms of Use

Except where otherwise indicated, The contents of this slide presentation are available for use under the Creative Commons Attribution 4.0 license.

You are free to adapt and share the work, but you must give appropriate credit, provide a link to the license, and indicate if changes were made.

Sample attribution: [Title of work] by Fred Hutchinson Cancer Center Data Science Lab. CC-BY 4.0

3 of 19

Don’t ask your primary care physician about how to fix your motorbike

Neither should you depend on an AI model for something it isn’t trained for

?

CC-by hutchdatascience.org

4 of 19

Input

Output

Training Data

What is this?

It’s an apple.

Algorithms

What’s your use case?

Customized Knowledge

My input questions are domain specific.

CC-by hutchdatascience.org

5 of 19

Prompt engineering

Prompt-tuning

Fine-tuning a model

Training from Scratch

Make a whole new ChatGPT; this is prohibitively expensive

Make the model better - but with a bit more investment

Make the model better - but efficiently

Highest

Investment

Lowest

Investment

Sources: https://hbr.org/2023/07/how-to-train-generative-ai-using-your-companys-data

The user of the model asks better questions

Getting a better output from a AI model

CC-by hutchdatascience.org

6 of 19

Sometimes it's not the model who needs training

It's the user

Prompt Engineering

Sources: Screenshot from chatGPT

CC-by hutchdatascience.org

7 of 19

1. Know the model’s strengths and weaknesses

2. Be as specific as possible

3. Utilize contextual prompts

4. Provide AI models with examples

5. Experiment with prompts and personas

6. Try chain-of-thought prompting

Source: https://cloud.google.com/blog/products/application-development/five-best-practices-for-prompt-engineering

Best practices for prompt engineering

CC-by hutchdatascience.org

8 of 19

https://poe.com/ and https://gpt.h2o.ai/

Their differing training and algorithms will lead to differing results

Test out prompts on multiple AI platforms

these websites have multiple competitors in one place

CC-by hutchdatascience.org

9 of 19

  • Using prompts to help training LLMs

  • More efficient than fine tuning

  • But may or may not address your customization needs

  • Like giving a crossword puzzle clue to the LLM

Source: https://research.ibm.com/blog/what-is-ai-prompt-tuning

Prompt tuning or “P-tuning”

CC-by hutchdatascience.org

10 of 19

A type or prompt tuning offered by https://gpt.h2o.ai/ – go to the Expert tab

Test out prompt tuning

CC-by hutchdatascience.org

11 of 19

Pictures from OpenMoji.org

Training a baby for a specialized education job would be costly and inefficient

Instead, you find a person who has a lot of the training you need and then fine-tune their skills.

<

Training from scratch vs. fine tuning

CC-by hutchdatascience.org

12 of 19

Pictures from OpenMoji.org

Source: https://www.wired.com/story/openai-ceo-sam-altman-the-age-of-giant-ai-models-is-already-over/

ChatGPT cost ~$100 million to create

Training models from scratch requires an insane amount of data and computing costs

It’s almost never where you will want to start.

Training from scratch

CC-by hutchdatascience.org

13 of 19

Before exploring fine tuning:

Are you sure no other model works?

(If you’ve only tried chatGPT go try other AI platforms)

https://www.nature.com/articles/d41586-023-03023-4

CC-by hutchdatascience.org

14 of 19

https://huggingface.co/blog/large-language-models

Finding a base open source base model to train for your purposes:

    • What AI model is trained on data most closely related to your application?
    • Which based model performs best based on your prior testing? Test LLM models on side by side here
    • Choose the smallest one that you can get away with

Fine tuning

train an existing model to do the job better

CC-by hutchdatascience.org

15 of 19

https://s10251.pcdn.co/wp-content/uploads/2023/03/2023-Alan-D-Thompson-AI-Bubbles-Rev-7b.png

CC-by hutchdatascience.org

16 of 19

CC-by hutchdatascience.org

17 of 19

*More on these items in future chapters

  • How will you evaluate an AI models performance?*

  • Where do the existing models fall short?
    • What information does it need to perform better?
      • Do you have this data?
      • How much cleaning will it need?
      • Is it unprotected?*

Fine tuning

train an existing model to do the job better

CC-by hutchdatascience.org

18 of 19

*More on this in the next chapter

What is needed

P-tuning Docs

Fine Tuning Docs

For use with protected data?

No code needed

Subscription fees

Data cleaning likely needed

No

Can be used from GUI or command line

Data cleaning likely needed

Not from the GUI but the command line could if built right

Python needed

Data cleaning likely needed

If you build it right*

Example starting points for Fine or P-tuning

CC-by hutchdatascience.org

19 of 19

You are here

Ongoing monitoring throughout the project

CC-by hutchdatascience.org