1 of 50

Prompt Engineering Fundamentals

dlab.berkeley.edu

2 of 50

This workshop

An introduction to LLMs
What is Prompt engineering?
Prompting techniques
Tips

📝 Poll 1: How often do you use LLMs like Claude or ChatGPT, and what do you use them for?

3 of 50

LLM chatbots

ChatGPT, Claude, Gemini:

Generative Pre-trained Transformers

Generative because they generate text
Pretrained because they are trained before use
Transformer is the architecture

LLMs are trained on large amounts of text and generate, given some input, an output that is reasonable
Learn more about technical details of GPT in our GPT Fundamentals workshop.

4 of 50

5 of 50

What is a “large amount of text”?

GPT3 trained over 500GB of text

Billions of webpages; trillions of words
a “blurry JPEG” or “zip-file” of the web
500 million digitized books; billions of words
Public code from github, stack overflow and elsewhere
170 billion connections between words

This takes a long time (months of parallel supercomputer processing) and a lot of money (billion USD to train, 100 million active users per month = half a million USD per day)

6 of 50

7 of 50

8 of 50

How do LLM chatbots work?

Write one word at a time based on prediction.
Add some randomness (‘temperature’) for each word that is predicted.
Taught to align with human values (whose values?).

9 of 50

How are words predicted?

Brute force
Every word in the English language is assigned a number (about 50K words)
It looks up what your query corresponds to in number-words
It runs those numbers against the billions of weights it has learned during training
Outputs another list with all words in the English language with a probability next to each one
One of the most probable words is chosen (depending on temp)

10 of 50

Are models probabilistic or deterministic?

Yes to both
Step 1: Predict Next Tokens

Model outputs a ranked list of likely next words (e.g., “fence” (0.77), “ledge” (0.12)).
This step is deterministic — same prompt, same distribution.

Step 2: Decode into Text

Converts probabilities into words using a decoding strategy
Greedy: always pick top token (deterministic)
Sampling: pick randomly based on probabilities (stochastic)

Temperature controls randomness

0 = predictable; higher = more surprising

11 of 50

What is prompt engineering?

The art of effectively communicating with AI to elicit desired results.

12 of 50

https://cdn.openai.com/pdf/a253471f-8260-40c6-a2cc-aa93fe9f142e/economic-research-chatgpt-usage-paper.pdf

https://www.oneusefulthing.org/p/an-opinionated-guide-to-using-ai

13 of 50

What is prompt engineering?

14 of 50

Before you do prompt engineering…

Setting good success criteria

A clear definition of the success criteria for your use case
Some ways to empirically test against those criteria
A first draft prompt you want to improve

15 of 50

Model selection matters

Use powerful models (Claude Opus, GPT 5 thinking mode, Gemini Pro) for serious work
Fast models are fine for casual chat but insufficient for high-stakes tasks
Most systems default to fast models to save computing power
Many issues can most easily improved by selecting a different model
Older AI models required you to generate a prompt using techniques like chain-of-thought.
As AI models get better, the importance of this fades and the models get better at figuring out what you want.

16 of 50

📝 Poll 2: Model selection

You have two models:

GPT-5 Instant
GPT-5 Thinking (reasoning model) �

Which one would you use for each of these tasks? Why?

Writing a grant application outline
Creating 50 Twitter ad variants
Debugging Python code with obscure errors
Converting a meeting transcript into action items

17 of 50

Reasoning models

e.g. GPT 5 Thinking, Gemini Pro, Claude Haiku/Sonnet/Opus
Generate an internal chain of thought to analyze the input prompt, and excel at understanding complex tasks and multi-step planning. They are also generally slower and more expensive to use.

18 of 50

Reasoning or not?

A reasoning model is like a senior co-worker. You can give them a goal to achieve and trust them to work out the details.
A non-reasoning model is like a junior coworker. They'll perform best with explicit instructions to create a specific output.

Some models (like Claude’s) are hybrid models which decide when to “turn on” thinking mode

19 of 50

Kinds of input

20 of 50

Prompting techniques

Write clear instructions
Specify output
Provide examples (few-shot)
Apply constraints
Provide a role

21 of 50

Let’s start from a barebones prompt

Suppose we’re interested in searching for some datasets on environmental discourse as a potential for a final project.

We could try…

https://chatgpt.com/share/68a6638d-4030-8012-837b-6b241d245350

22 of 50

Let’s start from a barebones prompt

Or, in newer models, we may trigger “thinking”, which can produce some interesting results…

23 of 50

While this is good, we can do better!

24 of 50

1. Write clear instructions

Give context

What the task results will be used for
What workflow the task is a part of, and where this task belongs in that workflow
The end goal of the task, or what a successful task completion looks like

Include details to get more relevant answers

E.g. website URLs (most chatbots can access the internet), copy-pasted documentation

Use delimiters to clearly indicate distinct parts of the input

E.g. wrap your examples in XML tags for structure

25 of 50

Example: Write clear instructions

Worse	Better
How do I add numbers in Excel?	How do I add up a row of dollar amounts in Excel? I want to do this automatically for a whole sheet of rows with all the totals ending up on the right in a column called "Total".
Summarize the meeting notes.	Summarize the meeting notes in a single paragraph. Then write a markdown list of the speakers and each of their key points. Finally, list the next steps or action items suggested by the speakers, if any.
Analyze this agreement for potential risks and liabilities: {{CONTRACT}}. Focus on indemnification, limitation of liability, and IP ownership clauses.	Analyze this software licensing agreement for legal risks and liabilities. We’re a multinational enterprise considering this agreement for our core data infrastructure. <agreement> {{CONTRACT}} </agreement>

26 of 50

Exercise: Write clear instructions

Use ChatGPT, Gemini or Claude
Improve the following prompt and observe the difference

PROMPT

Summarize this.�https://arxiv.org/abs/1706.03762

27 of 50

2. Specify output

28 of 50

3. Provide examples (few-shot)

29 of 50

30 of 50

31 of 50

4. Apply constraints

32 of 50

5. Provide a role

Worse	Better
Analyze this software licensing agreement for potential risks: <contract> {{CONTRACT}} </contract> Focus on indemnification, liability, and IP ownership.	You are the General Counsel of a Fortune 500 tech company. We’re considering this software licensing agreement for our core data infrastructure: <contract> {{CONTRACT}} </contract> Analyze it for potential risks, focusing on indemnification, liability, and IP ownership. Give your professional opinion.

33 of 50

Exercise: Provide a role

ROLE PROMPT

• You are a friendly high-school teacher…

• You are a meticulous patent lawyer…

• You are a sarcastic stand-up comic…

PROMPT

Explain quantum computing to me.

34 of 50

📝 Poll 3: Strengths and Weaknesses

Tasks

Summarize a 30-page PDF into five bullet points
Write Python code that reads a CSV and plots a bar chart
Explain whether a data-sharing plan complies with new European AI Act requirements
Predict whether a specific paper will be accepted to NeurIPS based on a PDF draft
Debug a failing SQL query by inspecting database schema and proposing a correct join
Convert a messy Teams/Zoom transcript into a clean set of minutes with assigned action items

Classify each as an LLM strength or weakness.

35 of 50

Privacy

Do not upload sensitive data.
You can turn off learning in Settings of model platforms if you are a paid member, but you still won’t know where the data is.

36 of 50

(Preventing) hallucination

Not really a specific term, used for any undesirable output.
LLMs confidently make up facts and tend to sycophantic behavior
For fact-checked information with citations, consider platforms like Perplexity AI
For creative writing, brainstorming, or general conversation, use Claude, GPT, or Gemini

37 of 50

Prompt injection

Models do not really distinguish between instructions and input

https://simonwillison.net/series/prompt-injection/

38 of 50

Tips

1. Provide rich context

Upload documents, images, or files whenever possible
Give background information about yourself and your situation
AI models only know what's in the current chat

2. Ask for abundance

Request 50 ideas instead of 10
Ask for 30 ways to improve a sentence
AI doesn't get tired or resentful

39 of 50

Tips

3. Use interactive approaches

Engage in back-and-forth conversations
Push back and question the AI's responses
Encourage the AI to ask you clarification questions when doing a more complicated analysis (better for reasoning models)

4. Advanced features to leverage

Deep Research: Generate comprehensive, well-cited reports
Voice Mode: Natural conversation with screen/camera sharing
Multimodal: Point camera at problems for real-time analysis

40 of 50

Tips

5. Common pitfalls to avoid

Relying on default settings and fast models
Not providing sufficient context
Using AI like Google (quick questions without depth)
Forgetting that AI can hallucinate confidently
Not engaging in iterative refinement

41 of 50

Exercise: Iterative Prompt Refinement

Your task is to create a prompt that makes an LLM effectively evaluate customer service email responses. You'll start with a basic prompt and iteratively improve it using the techniques we've covered.

Goal:

Build a prompt that can accurately assess customer service emails on a scale of 1-10, considering factors like helpfulness, tone, completeness, and professionalism.

42 of 50

Test Case Materials

Customer Inquiry: "I ordered a laptop 5 days ago (Order #12345) and it still hasn't shipped. The website said 2-3 day processing. I need this for an important presentation next week. What's going on?"

Three Response Examples to Evaluate:

Response A: "Hi! Thanks for reaching out. I see your order #12345 for the laptop. There was a slight delay in our warehouse, but I've personally escalated this and it will ship tomorrow with expedited delivery at no extra charge. You'll receive tracking info within 24 hours. I've also added a 15% discount to your account for the inconvenience. I understand how important this is for your presentation - please let me know if you need anything else!"

Response B: "Your order is delayed. It will ship soon. Check your email for tracking."

Response C: "I apologize for any inconvenience. Unfortunately, that item is currently backordered and we don't have an estimated ship date. You can cancel your order if you'd like a refund, or wait for it to become available. Let me know what you'd prefer."

43 of 50

Stage 1: Basic Prompt

Create your initial prompt without any special techniques:

Your Prompt:

[Write your basic prompt here]

Test Results: Rate how well it evaluates the three responses. What's missing?

44 of 50

Stage 2: Add Clear Instructions

Revise your prompt with specific, clear instructions about what to evaluate and how.

Techniques to Apply:

Be specific about evaluation criteria
Define what makes a good vs. poor response
Clarify the rating scale

Test Results: How did the clarity improve the evaluations?

45 of 50

Stage 3: Add Examples (Few-Shot)

Include 1-2 example evaluations to show the LLM exactly what you want.

Techniques to Apply:

Provide sample customer service emails with ideal evaluations
Show the reasoning process you want
Demonstrate the output format in action

Test Results: Are the evaluations more aligned with your expectations?

46 of 50

Stage 4: Apply Constraints

Add specific constraints and boundaries to prevent unwanted outputs.

Techniques to Apply:

Set score ranges and what triggers each score
Specify what NOT to do
Add guidelines for edge cases
Ensure consistent scoring criteria

Test Results: Are the scores more reliable and consistent?

47 of 50

Stage 5: Provide a Role

Give the LLM a specific role/persona to enhance its performance.

Techniques to Apply:

Assign relevant expertise (customer service trainer, quality assurance manager, etc.)
Define the LLM's background and perspective
Explain the context of why these evaluations matter

Final Test Results: Compare your final prompt's performance to your initial version.

48 of 50

Reflection Questions

Biggest Improvement: Which technique made the most dramatic improvement to your prompt's performance?

Unexpected Challenges: What aspects of the task were harder to prompt for than expected?

Trade-offs: Did any techniques conflict with each other or require balancing?

Real-world Application: How would you adapt this approach for other evaluation tasks in your work?

49 of 50

The Future of Data Science

50 of 50

Join D-Lab’s Mailing List

Stay up to date with upcoming workshops, and campus job and funding opportunities!

dlab.berkeley.edu/newsletter