1 of 50

Prompt Engineering Fundamentals

dlab.berkeley.edu

2 of 50

This workshop

  • An introduction to LLMs
  • What is Prompt engineering?
  • Prompting techniques
  • Tips

📝 Poll 1: How often do you use LLMs like Claude or ChatGPT, and what do you use them for?

3 of 50

LLM chatbots

  • ChatGPT, Claude, Gemini:
    • Generative Pre-trained Transformers
      • Generative because they generate text
      • Pretrained because they are trained before use
      • Transformer is the architecture
    • LLMs are trained on large amounts of text and generate, given some input, an output that is reasonable
    • Learn more about technical details of GPT in our GPT Fundamentals workshop.

4 of 50

5 of 50

What is a “large amount of text”?

  • GPT3 trained over 500GB of text
    • Billions of webpages; trillions of words
    • a “blurry JPEG” or “zip-file” of the web
    • 500 million digitized books; billions of words
    • Public code from github, stack overflow and elsewhere
    • 170 billion connections between words
  • This takes a long time (months of parallel supercomputer processing) and a lot of money (billion USD to train, 100 million active users per month = half a million USD per day)

6 of 50

7 of 50

8 of 50

How do LLM chatbots work?

  • Write one word at a time based on prediction.
  • Add some randomness (‘temperature’) for each word that is predicted.
  • Taught to align with human values (whose values?).

9 of 50

How are words predicted?

  • Brute force
  • Every word in the English language is assigned a number (about 50K words)
  • It looks up what your query corresponds to in number-words
  • It runs those numbers against the billions of weights it has learned during training
  • Outputs another list with all words in the English language with a probability next to each one
  • One of the most probable words is chosen (depending on temp)

10 of 50

Are models probabilistic or deterministic?

  • Yes to both
  • Step 1: Predict Next Tokens
    • Model outputs a ranked list of likely next words (e.g., “fence” (0.77), “ledge” (0.12)).
    • This step is deterministic — same prompt, same distribution.
  • Step 2: Decode into Text
    • Converts probabilities into words using a decoding strategy
    • Greedy: always pick top token (deterministic)
    • Sampling: pick randomly based on probabilities (stochastic)
  • Temperature controls randomness
    • 0 = predictable; higher = more surprising

11 of 50

What is prompt engineering?

The art of effectively communicating with AI to elicit desired results.

12 of 50

13 of 50

What is prompt engineering?

14 of 50

Before you do prompt engineering…

  • Setting good success criteria
    • A clear definition of the success criteria for your use case
    • Some ways to empirically test against those criteria
    • A first draft prompt you want to improve

15 of 50

Model selection matters

  • Use powerful models (Claude Opus, GPT 5 thinking mode, Gemini Pro) for serious work
  • Fast models are fine for casual chat but insufficient for high-stakes tasks
  • Most systems default to fast models to save computing power
  • Many issues can most easily improved by selecting a different model
  • Older AI models required you to generate a prompt using techniques like chain-of-thought.
  • As AI models get better, the importance of this fades and the models get better at figuring out what you want.

16 of 50

📝 Poll 2: Model selection

  • You have two models:
    1. GPT-5 Instant
    2. GPT-5 Thinking (reasoning model) �
  • Which one would you use for each of these tasks? Why?
    • Writing a grant application outline
    • Creating 50 Twitter ad variants
    • Debugging Python code with obscure errors
    • Converting a meeting transcript into action items

17 of 50

Reasoning models

  • e.g. GPT 5 Thinking, Gemini Pro, Claude Haiku/Sonnet/Opus
  • Generate an internal chain of thought to analyze the input prompt, and excel at understanding complex tasks and multi-step planning. They are also generally slower and more expensive to use.

18 of 50

Reasoning or not?

  • A reasoning model is like a senior co-worker. You can give them a goal to achieve and trust them to work out the details.
  • A non-reasoning model is like a junior coworker. They'll perform best with explicit instructions to create a specific output.
  • Some models (like Claude’s) are hybrid models which decide when to “turn on” thinking mode

19 of 50

Kinds of input

20 of 50

Prompting techniques

  1. Write clear instructions
  2. Specify output
  3. Provide examples (few-shot)
  4. Apply constraints
  5. Provide a role

21 of 50

Let’s start from a barebones prompt

Suppose we’re interested in searching for some datasets on environmental discourse as a potential for a final project.

We could try…

https://chatgpt.com/share/68a6638d-4030-8012-837b-6b241d245350

22 of 50

Let’s start from a barebones prompt

Or, in newer models, we may trigger “thinking”, which can produce some interesting results…

23 of 50

While this is good, we can do better!

24 of 50

1. Write clear instructions

  • Give context
    • What the task results will be used for
    • What workflow the task is a part of, and where this task belongs in that workflow
    • The end goal of the task, or what a successful task completion looks like
  • Include details to get more relevant answers
    • E.g. website URLs (most chatbots can access the internet), copy-pasted documentation
  • Use delimiters to clearly indicate distinct parts of the input
    • E.g. wrap your examples in XML tags for structure

25 of 50

Example: Write clear instructions

Worse

Better

How do I add numbers in Excel?

How do I add up a row of dollar amounts in Excel? I want to do this automatically for a whole sheet of rows with all the totals ending up on the right in a column called "Total".

Summarize the meeting notes.

Summarize the meeting notes in a single paragraph. Then write a markdown list of the speakers and each of their key points. Finally, list the next steps or action items suggested by the speakers, if any.

Analyze this agreement for potential risks and liabilities: {{CONTRACT}}. Focus on indemnification, limitation of liability, and IP ownership clauses.

Analyze this software licensing agreement for legal risks and liabilities.

We’re a multinational enterprise considering this agreement for our core data infrastructure.

<agreement> {{CONTRACT}} </agreement>

26 of 50

Exercise: Write clear instructions

  • Use ChatGPT, Gemini or Claude
  • Improve the following prompt and observe the difference

PROMPT

Summarize this.�https://arxiv.org/abs/1706.03762

27 of 50

2. Specify output

28 of 50

3. Provide examples (few-shot)

29 of 50

30 of 50

31 of 50

4. Apply constraints

32 of 50

5. Provide a role

Worse

Better

Analyze this software licensing agreement for potential risks:

<contract>

{{CONTRACT}}

</contract>

Focus on indemnification, liability, and IP ownership.

You are the General Counsel of a Fortune 500 tech company. We’re considering this software licensing agreement for our core data infrastructure:

<contract>

{{CONTRACT}}

</contract>

Analyze it for potential risks, focusing on indemnification, liability, and IP ownership. Give your professional opinion.

33 of 50

Exercise: Provide a role

ROLE PROMPT

• You are a friendly high-school teacher…

• You are a meticulous patent lawyer…

• You are a sarcastic stand-up comic…

PROMPT

Explain quantum computing to me.

34 of 50

📝 Poll 3: Strengths and Weaknesses

Tasks

  • Summarize a 30-page PDF into five bullet points
  • Write Python code that reads a CSV and plots a bar chart
  • Explain whether a data-sharing plan complies with new European AI Act requirements
  • Predict whether a specific paper will be accepted to NeurIPS based on a PDF draft
  • Debug a failing SQL query by inspecting database schema and proposing a correct join
  • Convert a messy Teams/Zoom transcript into a clean set of minutes with assigned action items

Classify each as an LLM strength or weakness.

35 of 50

Privacy

  1. Do not upload sensitive data.
  2. You can turn off learning in Settings of model platforms if you are a paid member, but you still won’t know where the data is.

36 of 50

(Preventing) hallucination

  • Not really a specific term, used for any undesirable output.
  • LLMs confidently make up facts and tend to sycophantic behavior
  • For fact-checked information with citations, consider platforms like Perplexity AI
  • For creative writing, brainstorming, or general conversation, use Claude, GPT, or Gemini

37 of 50

Prompt injection

Models do not really distinguish between instructions and input

38 of 50

Tips

1. Provide rich context

  • Upload documents, images, or files whenever possible
  • Give background information about yourself and your situation
  • AI models only know what's in the current chat

2. Ask for abundance

  • Request 50 ideas instead of 10
  • Ask for 30 ways to improve a sentence
  • AI doesn't get tired or resentful

39 of 50

Tips

3. Use interactive approaches

  • Engage in back-and-forth conversations
  • Push back and question the AI's responses
  • Encourage the AI to ask you clarification questions when doing a more complicated analysis (better for reasoning models)

4. Advanced features to leverage

  • Deep Research: Generate comprehensive, well-cited reports
  • Voice Mode: Natural conversation with screen/camera sharing
  • Multimodal: Point camera at problems for real-time analysis

40 of 50

Tips

5. Common pitfalls to avoid

  • Relying on default settings and fast models
  • Not providing sufficient context
  • Using AI like Google (quick questions without depth)
  • Forgetting that AI can hallucinate confidently
  • Not engaging in iterative refinement

41 of 50

Exercise: Iterative Prompt Refinement

Your task is to create a prompt that makes an LLM effectively evaluate customer service email responses. You'll start with a basic prompt and iteratively improve it using the techniques we've covered.

Goal:

Build a prompt that can accurately assess customer service emails on a scale of 1-10, considering factors like helpfulness, tone, completeness, and professionalism.

42 of 50

Test Case Materials

Customer Inquiry: "I ordered a laptop 5 days ago (Order #12345) and it still hasn't shipped. The website said 2-3 day processing. I need this for an important presentation next week. What's going on?"

Three Response Examples to Evaluate:

Response A: "Hi! Thanks for reaching out. I see your order #12345 for the laptop. There was a slight delay in our warehouse, but I've personally escalated this and it will ship tomorrow with expedited delivery at no extra charge. You'll receive tracking info within 24 hours. I've also added a 15% discount to your account for the inconvenience. I understand how important this is for your presentation - please let me know if you need anything else!"

Response B: "Your order is delayed. It will ship soon. Check your email for tracking."

Response C: "I apologize for any inconvenience. Unfortunately, that item is currently backordered and we don't have an estimated ship date. You can cancel your order if you'd like a refund, or wait for it to become available. Let me know what you'd prefer."

43 of 50

Stage 1: Basic Prompt

Create your initial prompt without any special techniques:

Your Prompt:

[Write your basic prompt here]

Test Results: Rate how well it evaluates the three responses. What's missing?

44 of 50

Stage 2: Add Clear Instructions

Revise your prompt with specific, clear instructions about what to evaluate and how.

Techniques to Apply:

  • Be specific about evaluation criteria
  • Define what makes a good vs. poor response
  • Clarify the rating scale

Test Results: How did the clarity improve the evaluations?

45 of 50

Stage 3: Add Examples (Few-Shot)

Include 1-2 example evaluations to show the LLM exactly what you want.

Techniques to Apply:

  • Provide sample customer service emails with ideal evaluations
  • Show the reasoning process you want
  • Demonstrate the output format in action

Test Results: Are the evaluations more aligned with your expectations?

46 of 50

Stage 4: Apply Constraints

Add specific constraints and boundaries to prevent unwanted outputs.

Techniques to Apply:

  • Set score ranges and what triggers each score
  • Specify what NOT to do
  • Add guidelines for edge cases
  • Ensure consistent scoring criteria

Test Results: Are the scores more reliable and consistent?

47 of 50

Stage 5: Provide a Role

Give the LLM a specific role/persona to enhance its performance.

Techniques to Apply:

  • Assign relevant expertise (customer service trainer, quality assurance manager, etc.)
  • Define the LLM's background and perspective
  • Explain the context of why these evaluations matter

Final Test Results: Compare your final prompt's performance to your initial version.

48 of 50

Reflection Questions

Biggest Improvement: Which technique made the most dramatic improvement to your prompt's performance?

Unexpected Challenges: What aspects of the task were harder to prompt for than expected?

Trade-offs: Did any techniques conflict with each other or require balancing?

Real-world Application: How would you adapt this approach for other evaluation tasks in your work?

49 of 50

The Future of Data Science

50 of 50

Join D-Lab’s Mailing List

Stay up to date with upcoming workshops, and campus job and funding opportunities!

dlab.berkeley.edu/newsletter