JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 55

Assignment 6 and Prompting, and In-context Learning

CSE 447/517

March 6th, 2025 (WEEK 9)

2 of 55

Logisitcs

Assignment 6 (A6) is due on Wednesday, 3/12
Project is due on Friday, 3/14

3 of 55

Agenda

Assignment 6
Prompting
In-context Learning
Advanced Techniques
Future of Prompting Tuning

4 of 55

Assignment 6

5 of 55

Top K Sampling

Sample from the top-k tokens with the highest probabilities
Create a mask to zero out the logits of the tokens that are not in the top-k
Divide the logits by the temperature
Sample from the remaining tokens

6 of 55

Top-p Sampling

Sort the logits in descending order and
Compute the cumulative probabilities on the sorted logits
Using torch.sort, softmax and cumsum
Create a mask to zero out all logits not in top-p
Restore mask to original indices
Sample from the masked logits

7 of 55

Generate Summaries with teacher model

See summarize_wth_student_model for example
Move the data to the device
Decode the generated summaries
Add the decoded summaries to the list

8 of 55

Prompt Engineering

Slides by Elvis Saravia https://www.promptingguide.ai/ and images from other sources

9 of 55

Rise of In-context Learning

Brown, Tom B. et al. “Language Models are Few-Shot Learners.” ArXiv abs/2005.14165 (2020): n. pag.

10 of 55

What are prompts?

Prompts involve instructions and context passed to a language model to achieve a desired task
Prompt engineering is the practice of developing and optimizing prompts to efficiently use language models (LMs) for a variety of applications

Prompt engineering is a useful skill for AI engineers and researchers to improve and efficiently use language models

What is prompt engineering?

Prompt engineering is a process of creating a set of prompts,

or questions, that are used to guide the user toward a desired

outcome. It is an effective tool for designers to create user

experiences that are easy to use and intuitive. This method is

often used in interactive design and software development, as

it allows users to easily understand how to interact with a

system or product..

11 of 55

Why Prompt Engineering?

12 of 55

LMs can be super sensitive to prompts!

Llama 3.2 - 1B model

Llama 3.2 1B Instruct

Playing w/ these models:

13 of 55

Why Prompt Engineering?

Why learn prompt engineering?

Important for research, discoveries, and advancement
Helps to test and evaluate the limitations of LLMs
Enables all kinds of innovative applications on top of LLMs

Source: https://jobs.lever.co/Anthropic/e3cde481-d446-460f-b576-93cab67bd1ed

14 of 55

Parameters of Decoding

Greedy/Beam Search generates less surprising/boring responses. Not desirable for open-ended tasks like dialog and story-telling.
Instead, sampling can be used.

Temperature

controls sharpness of the next-token distribution
value between 0 to 1
lower temperature -> sharper distribution -> repetitive generations

Top p

value between 0 to 1
select smallest set of tokens whose total likelihood exceeds p. Redistribute the probabilities
smaller p leads to repetitive generations

Source: https://huggingface.co/blog/how-to-generate

15 of 55

First Basic Prompt

The sky is

blue

The sky is a beautiful blue color during the day. The

blue of the sky is created by the Earth’s atmosphere

scattering the sun’s light. The blue is a result of the

air molecules in the atmosphere reflecting the shorter

wavelength of blue light back to our eyes.

Model: “text-davinci-003”

temperature: 0.7

top-p: 1

Instruction

Response

16 of 55

Elements of a Prompt

A prompt is composed with the following components:

Instructions
Context
Input data
Output indicator

Classify the text into neutral, negative or positive

Text: I think the food was okay. Sentiment:

17 of 55

Settings to keep in mind

When prompting a language model, you should keep in mind a few settings

You can get very different results with prompts when using different settings
One important setting is controlling how deterministic the model is when generating completion for prompts

Temperature and top_p are two important parameters to keep in mind
Generally, keep these low if you are looking for exact answers
…keep them high if you are looking for more diverse responses

18 of 55

Designing Prompts for Different Tasks

In the next few slides, we will cover a few examples of common tasks using different prompts
Tasks covered:

Text Summarization
Question Answering
Text Classification
Role Playing
Code Generation
Reasoning

19 of 55

Text Summarization

Antibiotics are a type

of medication used to treat

infections.

bacterial

bacteria or

They work by either killing the

preventing them from reproducing, allowing

the body’s immune system to fight off the infection. Antibiotics are usually taken orally in the form of pills, capsules, or liquid solutions, or sometimes

administered intravenously. They are not effective against viral infections, and using them inappropriately can lead to antibiotic resistance.

Explain the above in one sentence:

Antibiotics are medications used to treat bacterial

infections by either killing the bacteria or stopping

them from reproducing, but they are not effective against

viruses and overuse can lead to antibiotic resistance.

Context

Instruction

20 of 55

Question Answering

Context obtained from: https://www.nature.com/articles/d41586-023-00400-x

21 of 55

Text Classification

Classify the text into neutral, negative or positive.

Text: I think the food was okay.

Sentiment:

Neutral

22 of 55

Role Playing

23 of 55

Code Generation

“””

Table departments, columns = [DepartmentId, DepartmentName]

Table students, columns = [DepartmentId, StudentId, StudentName]

Create a MySQL query for all students in the Computer

Science Department

“””

SELECT StudentId, StudentName

FROM students

WHERE DepartmentId IN (SELECT DepartmentId FROM

departments WHERE DepartmentName = 'Computer Science');

24 of 55

Reasoning

The	odd	numbers	in		this		group		add	up	to	an	even	number:		15,
32,	5,	13, 82,	7,	1.
Solve by breaking the problem into steps. First, identify
the	odd	numbers,		add		them,		and	indicate			whether		the	result
is	odd	or even.

Sum: 41

Odd numbers: 15, 5, 13, 7, 1

41 is an odd number.

25 of 55

Prompt Engineering Techniques

Many advanced prompting techniques have been designed to improve performance on complex tasks

Few-shot prompts
Chain-of-thought (CoT) prompting
Self-Consistency
Knowledge Generation Prompting
ReAct

26 of 55

Few-shot Prompts

Few-shot prompting allows us to provide exemplars in prompts to steer the model towards better performance

27 of 55

Chain-of-Thought (CoT) Prompting

Prompting can be further improved by instructing the model to reason about the task when responding

This is very useful for tasks that requiring reasoning
You can combine it with few-shot prompting to get better results

Source: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

The odd numbers in this group add up to an even number: 4,

28 of 55

Zero-Shot CoT

Involves adding "Let's think step by step" to the original prompt

29 of 55

Self-Consistency

Self-Consistency aims to improve on the naive greedy decoding used in chain-of-thought prompting
The idea is to sample multiple, diverse reasoning paths through few-shot CoT, and use the generations to select the most consistent answer.
This helps to boost the performance of CoT prompting on tasks involving arithmetic and commonsense reasoning

sister was half my age. Now

When I was 6 my

I’m 70 how old is my sister?

Many examples were generated but the model kept responding 35 as the answer

Source: Self-Consistency Improves Chain of Thought Reasoning in Language Models

30 of 55

Advanced Techniques for Prompt Engineering

Slides by Elvis Saravia https://www.promptingguide.ai/ and images from other sources

31 of 55

Self-Consistency Example

Q: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done, there will be 21 trees. How many trees did the grove workers plant today?

A: We start with 15 trees. Later we have 21 trees. The difference must be the number of trees they planted. So, they must have planted 21 - 15 = 6 trees.

The answer is 6.

Q: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?

A: There are 3 cars in the parking lot already. 2 more arrive. Now there are

3 + 2 = 5 cars. The answer is 5.

...

five bagels for $3 each. How much money does

Q: Olivia has $23. She bought she have left?

A: She bought 5 bagels for $3

each. This means she spent 5

Q: When I was 6 my sister was half my age. Now I’m 70 how old is my sister? A:

When I was 6 my sister was half my age, so she was 3. Now I am 70, so she

is 70 - 3 = 67. The answer is 67.

When the narrator was 6, his sister was half his age, which is 3. Now

that the narrator is 70, his sister would be 70 - 3 = 67 years old. The

answer is 67.

When the narrator was 6, his sister was half his age, which is 3. Now

that the narrator is 70, his sister would be 70 - 3 = 67 years old. The

answer is 67.

Output 1

Output 2

Output 3

32 of 55

Generate Knowledge Prompting

This technique involves using additional knowledge provided as part of the context to improve results on complex tasks such as commonsense reasoning
The knowledge used in the context is generated by a model and used in the prompt to make a prediction

Highest-confidence prediction is used

Source: Generated Knowledge Prompting for Commonsense Reasoning

33 of 55

Generate Knowledge Prompting Example

The first step is to generate knowledge. Below is an example of how to generate the knowledge samples

Input: Greece is larger than mexico.

Knowledge: Greece is approximately 131,957 sq km, while Mexico is approximately 1,964,375 sq km, making Mexico 1,389% larger than Greece.

...

Input: A rock is the same size as a pebble.

Knowledge: A pebble is a ^clastof rock with a particle size of 4 to 64 millimetres based on the ^{Udden-Wentworth scale of sedimentology. Pebbles}are generally considered ^{larger than granules (2 to 4 millimetres}diameter) and smaller than cobbles (64 to 256 millimetres diameter).

Input: Part of golf is trying to get a higher point total than others.

Knowledge:

The objective of golf is to play a set of holes in the least number of strokes. A

round of golf typically consists of 18 holes. Each hole is played once in the

round on a standard golf course. Each stroke is counted as one point, and the

total number of strokes is used to determine the winner of the game.

Golf is a precision club-and-ball sport in which competing players (or golfers)

use many types of clubs to hit balls into a series of holes on a course using the

fewest number of strokes. The goal is to complete the course with the lowest

score, which is calculated by adding up the total number of strokes taken on each

hole. The player with the lowest score wins the game.

Knowledge 2

Knowledge 1

34 of 55

Generate Knowledge Prompting Example

The knowledge samples are then used to generate

knowledge augmented questions to get answer proposals

The highest-confidence response is selected as final answer

Question: Part of golf is trying to get a higher point total than others.

Yes or No?

Knowledge: The objective of golf is...

Explain and Answer: No, the objective of golf is not to get a higher point total than others. Rather, the objective is to play a set of holes in the least number of strokes. The total number of strokes is used to determine the winner of the game, not the total number of points.

Question: Part of golf is trying to get a higher point total than others. Yes or No?

Knowledge: Golf is a precision club-and-ball sport in which competing

players (or golfers) use many types of clubs to hit balls into...

Explain and Answer:

Yes, part of golf is trying to get a higher point

total than others. Each player tries to complete the course with the

lowest score, which is calculated by adding up the total number of

strokes taken on each hole. The player with the lowest score wins the

game.

High-confidence prediction

Low-confidence prediction

35 of 55

Program-aided Language Model (PAL)

Chain-of-thought prompting is a good example of how to steer models to perform better at complex reasoning tasks

However, sometimes CoT is not enough as it depends only on the generated text from the model

Program-aided language models (PAL) uses an LLM to read problems and generate programs as the intermediate reasoning steps

It offloads the solution step to a runtime such as Python interpreter

36 of 55

PAL

Source: PAL: Program-aided Language Models

37 of 55

ReAct

ReAct is a framework where LLMs are used to generate both reasoning traces and task-specific actions in an interleaved manner

Generating reasoning traces allow the model to induce, track, and update action plans, and even handle exceptions
The action step allows to interface with and gather information from external sources such as knowledge bases or environments.

ReAct allows LLMs to interact with external tools to retrieve additional information that leads to more reliable and factual responses

38 of 55

ReAct

Source: ReAct: Synergizing Reasoning and Acting in Language Models

39 of 55

Directional Stimulus Prompting

Prompting technique to better guide the LLM in generating the desired summary.
A tuneable policy LM is trained to generate the hints that guide a black-box frozen LLM.

40 of 55

Directional Stimulus Prompting

41 of 55

Risks

Slides by Elvis Saravia https://www.promptingguide.ai/ and images from other sources

42 of 55

Risks

In this section, we discuss the following:

Prompt Injection
Prompt Leaking
Jail Breaking

43 of 55

Prompt Injection

Prompt injection is used to hijack an LM’s output by injecting an untrusted command that overrides instruction of a prompt
This could easily happen if you just concatenate your prompt with another user generated prompt

44 of 55

Prompt Leaking

Prompt leaking aims to force the model to spit out information about its own prompt.
This can lead to leaking of either sensitive, private or information that’s confidential

45 of 55

Jailbreaking

Jailbreaking is another form of prompt injection where the goal is to bypass safety and moderation features
LLMs provided via APIs might be coupled with safety features or content moderation which can be bypassed with harmful prompts/attacks
This might sound like a difficult task but it’s not because the model is usually served static and might have these vulnerabilities due to many factors such as the data it was trained on, etc.

46 of 55

Jailbreaking examples

47 of 55

Prompt Engineering Guide

https://github.com/dair-ai/Prompt-Engineering-Guide

48 of 55

Actually… Let’s just make LMs not super sensitive to prompts?

49 of 55

http://arxiv.org/pdf/2210.11416

50 of 55

https://openai.com/index/instruction-following/

51 of 55

https://openai.com/index/instruction-following/

52 of 55

An example in the wild…

53 of 55

54 of 55

55 of 55

Questions?

Thank you!