1 of 49

Introduction to the Internet & the Web

CSCI 344: Advanced Web Technologies

Spring 2025

2 of 49

Announcements

  • Pre-Course Assessment
    • Solutions posted. 3 students got a perfect score
    • If you got < half correct, I emailed you. I highly recommend coming to one of the programming review session this Friday during my office hours.
    • My hunch is that you just need a few reminders…but you should be able to do problems like the ones on the pre-course assessment quickly and easily (but it takes practice).
    • Review + practice problems posted here.
  • Readings for Friday

3 of 49

Learning Goals for the Week

During this course, we’ll be building different websites and programs that will eventually be part of the Internet. But before we do, we’re going to think a bit about the “big picture.”

  • Today: How the Internet and web work
  • Friday: Societal considerations of the Internet:
    • How big of an issue is privacy and surveillance on the Internet?
    • How do Internet business models shape what gets seen and by whom?
    • Is GenAI destroying the Open Internet?

4 of 49

Let’s find out what you already �know about the Internet...

5 of 49

Before we talk about the particulars of the Internet, let’s define it…

What is the Internet?

  • A computer telephone. Interconnected network of various computers. Military lovechild. Darpa. Playground of racist trolls. Immorality. Repo for info and mis-information.

How are the Internet and the web different?

6 of 49

The term “The Internet” has many meanings

  • “The Internet” can refer to the technical infrastructure (e.g. fiber optic cables, “last mile” networks, data centers, and TCP/IP) – discussed today.
  • “The Internet” also refers to the “social life of the Internet”(i.e., content and services the run on top of internet infrastructure) – discussed Friday.

7 of 49

What’s the difference between the Internet and the Web?

Oftentimes “The Web” and “The Internet” are used interchangeably. But from a CS perspective, they’re very different:

  • The Internet refers to the physical network infrastructure + TCP/IP protocol
  • The Web is one application that runs on top of the internet (others include zoom, email, telnet, networked television, ssh, gaming engines, etc.).

8 of 49

Some questions about the Internet...

  1. What is the cloud?
  2. Who invented the Internet?
  3. Where is the Internet?
  4. What is TCP/IP?
  5. What is DNS?
  6. Who controls the Internet?
  7. What is Net Neutrality?
  8. Who can access your data over the internet?

9 of 49

What is the cloud? What are some examples of cloud services?

  • The cloud is basically having access to someone else’s computer(s)
    • Refers to software and information and computing services (including storage) that run on someone else’s computer instead of on yours
  • Cloud services can be accessed through a Web browser like Firefox or Google Chrome, through mobile devices, IoT services, etc.
    • What this means is that your computer is constantly interacting with the cloud.
  • Bottom line: your devices are interacting with other services and systems (and by extension other businesses and people) all of the time. This isn’t necessarily good or bad, but it’s important to think about.

10 of 49

Who Invented the Internet?

  • The internet began as ARPANET, an academic research network that was funded by the military (DARPA), beginning in 1969
  • In 1973, Vint Cerf and Bob Kahn began work on TCP/IP, the next networking standards that became the foundation of the modern internet
  • In 1981 funding for the internet shifted to the NSF, which funded the long-distance networks that served as the internet’s backbone until 1994
  • In 1994, the Clinton Administration privatized the internet backbone

11 of 49

1969

1970

1973

1982

12 of 49

Where is the Internet?

Three primary components:

  1. The Internet Backbone�Long-distance networks — mostly on fiber optic cables — that carry data between data centers and consumers
  2. The “Last Mile” �The part of the internet that connects homes and small businesses to the internet.
  3. Data Centers�Can be located anywhere in the world, but they are often located in remote areas where land and electricity are cheap.

13 of 49

Internet “Backbone”: Global

14 of 49

Internet “Backbone”: National

  • A very high-speed data transmission line that provides networking facilities to relatively small but high-speed Internet service providers all around the world.
  • Some of the largest companies running different parts of the Internet backbone include UUNET, AT&T, GTE Corp. and Sprint Nextel Corp.

15 of 49

The “Last Mile”: Local

  • The final connectivity leg between the telecommunication service provider and an individual customer (e.g. Comcast, Spectrum, AT&T, etc.).
  • The most widely used last mile technologies are DSL Cable + Cable Modem, fiber optic, or wireless access.

16 of 49

Pictured: A Google Data Center

17 of 49

Who controls the Internet?

  • No one runs the Internet. It’s organized as a decentralized network of networks.
  • Thousands of companies, universities, governments, and other entities operate their own networks and exchange traffic with each other based on voluntary interconnection agreements
  • There are technical standards committees (ICANN, IETF, etc.) that do meet and agree on rules of traffic exchange
  • Governments can filter and block traffic (as can companies, schools, etc.)

18 of 49

What is TCP/IP?

TCP (Transmission Control Protocol): Ensures reliable transmission of data across a network.

  • Breaks down data into smaller packets before they are sent.
  • Ensures packets are delivered, checks for errors, and reassembles them in the correct order on the receiving end.
    • On the other hand, UDP (User Datagram Protocol) does not ensure that packets are delivered (which makes it faster…and this is sometimes OK).
  • Provides flow control and congestion control to maintain the stability of the network.

19 of 49

What is TCP/IP?

IP (Internet Protocol): Handles addressing and routing of packets to ensure they reach the correct destination.

  • Assigns a unique IP address to each device on the network.
  • Determines the best route for data to travel across the network.
  • Data doesn’t necessarily arrive in order (up to TCP/UDP to figure that out).

20 of 49

Why would you use UDP v. TCP?

UDP:

  • Used for time-sensitive applications (e.g. gaming, zoom, playing videos, or Domain Name System (DNS) lookups)
  • If a packet drops here or there, oh well!

TCP:

  • Used when you need data integrity (# of bits sent === # of bits received).
  • Documents, etc.
  • Slower

21 of 49

What is DNS?

DNS (Domain Name System) is a way of assigning human-readable addresses to IP addresses (so people don’t have to remember long sequences of numbers).

  • When a client (e.g. browser) asks for a website at a domain name, the browser first looks in its cache for the associated IP, then to a DNS resolver (usually your ISP).
  • If that particular resolver can’t find it, a series of queries are issued to other DNS servers ‘til one of them finally has the IP address.
  • That domain name / IP address mapping are then propagated to the querying DNS servers and cached (so it’s faster next time).

22 of 49

DNS Sample Lookup Table

23 of 49

What happens in a DNS Attack?

24 of 49

How do wireless communications work?

Radio waves: Each form of wireless communication uses a different spectral frequencies:

  • Some are licensed (e.g., Verizon, AT&T, etc. each “rent” different parts of the spectrum)
  • Some are open for anyone to use (e.g., for WiFi, Bluetooth, etc.)

25 of 49

26 of 49

Who can access your data (transmission & storage)?

What did the 2013 Snowden leaks reveal?

  • Britain's “Tempora” taps fiber optic cables around the world
  • NSA could hack into Google and Yahoo data centers w/o their knowledge
  • Verizon had been providing the NSA w/all of its phone records
  • NSA can request user data from companies, which they are compelled to deliver on by law (PRISM)
  • NSA undermines encryption via backdoors and promoting the use of weaker algorithms

Sources:

27 of 49

Part 2: Intro to the Web

28 of 49

Let’s find out what you already �know about the Web...

29 of 49

Some questions about the web...

  1. Who invented the world wide web?
  2. What is a URL?
  3. What is HTTP?
  4. What is a web server?
  5. What is a browser?
  6. What is an IP address?
  7. What is a domain name?
  8. What is a search engine?

30 of 49

Who invented the Worldwide Web?

1989: Tim Berners-Lee, a British computer scientist, conceptualized and built the three foundational web technologies:

  • HTML: HyperText Markup Language. The markup (formatting) language for the web.
  • URI: Uniform Resource Identifier. A kind of “address” that is unique and used to identify to each resource on the web. It is also commonly called a URL.
  • HTTP: Hypertext Transfer Protocol. Allows for the retrieval of linked resources from across the web.

31 of 49

What is a URL?

A URL, or Uniform Resource Locator, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it.

A typical URL could have the form: http://www.example.com/index.html:

  • http: indicates the protocol to be used to retrieve the document
  • www.example.com: the hostname (domain name)
  • index.html: the file name to be retrieved index.html

32 of 49

What is HTTP?

  • Stands for Hypertext Transfer Protocol
  • Refers to the rules that servers and browsers must follow to in order to transfer web files over the Internet.
  • HTTP allows (authorized) users to create, update, or delete resources on a web server
  • HTTPS adds encryption and security to the data transmission process

33 of 49

What is a web server?

  • A computer or device on a network that other computers can access for information, communication, or computational services
  • Any computer can become a web server if...
    1. It has been given a public address (IP address)
    2. It listens to requests over Port 80 (HTTP) or Port 443 (HTTPS)

34 of 49

What is a web browser?

  • A program that allows a user to locate, access, and display Web pages.
  • What are some examples of browsers?
    • Safari, Firefox, Chrome, Microsoft Edge, and Opera
  • In this class, we will be writing programs that browsers can understand.
    • Think of your browser as the thing that’s reading and interpreting your HTML, CSS, and JavaScript files.

Browser

Browser

Browser

Web Server

35 of 49

What Can a Browser Do?

Browsers have several different jobs...

  1. They interact with servers to access resources
    1. text files (HTML, CSS, and JavaScript files)
    2. images
    3. data files
    4. video & audio files

Browsers can also create, delete, and modify server resources

35

36 of 49

What Can a Browser Do?

  1. They interpret instructions (that you will write) and render (i.e. “draw”) text, images, and graphics to the screen.
  2. They respond to user events via default behaviors or via custom behaviors that are controlled by JavaScript
  3. Write local data (cookies, local storage, password storage, history to your hard drive)

36

37 of 49

How a browser interprets files

Here are the steps that a browser follows to render an HTML page to the screen:

  1. Pulls down the HTML file
  2. Reads it, scans it for links to other resources (“src” and “href” attributes), and then pulls down linked files
  3. As it pulls down resources, it redraws the screen with the information. The addition of new image, CSS, and JavaScript files usually triggers a screen redraw

37

38 of 49

Using the Browser Inspector

Like with all programming, you will encounter errors as you develop your websites. The Browser Inspector is the very best resource that you have to help you resolve issues. It can help you...

  • Inspect and change elements and CSS properties
  • Examining the files that your browser retrieves
  • Examining requests and responses (communications)
  • Help you identify JavaScript errors

38

39 of 49

Activity 1

Examining different sources of content

39

40 of 49

What is an IP address?

  • An IP address is network address for your computer so the Internet knows where to send you emails, data and pictures of cats
  • The IANA allocates IP address blocks to regional Internet registries. These are then divided into smaller sub-blocks and assigned to individuals and institutions
  • We have already run out of IP addresses (e.g. 66.171.248.170), and so we’re now in the process of transitioning to IPv6 �

41 of 49

What is a domain name?

  • A way of assigning human-readable names to IP addresses (which are difficult to remember)
  • To “Purchase” a domain name (really you’re leasing it), you pay money to an authorized registrar (GoDaddy is the biggest one)
  • The domain name system (DNS) is a hierarchical naming system (big dictionary) that keeps track of IP addresses and their associated domain names
  • Makes it possible to assign domain names to groups of Internet resources and users, regardless of the entities' physical location�

42 of 49

What is a search engine?

Name some examples of search engines…

Why would you use one search engine over another?

43 of 49

What is a search engine?

A system for organizing and retrieving Web pages. Search engines perform three basic tasks:

  1. Crawling – where content is discovered
  2. Indexing – where it is analysed and stored in huge databases; keywords, context, and other metadata about a website stored for easy retrieval later on; and
  3. Retrieval – where a user query fetches a list of relevant pages and sorts them.

44 of 49

How do search engines work?

General process:

  1. Visits a page and makes a copy of it; stores both the link and the contents of the webpage somewhere (i.e. in a database)
  2. Reads all of the link URLs on that page and then visits those pages and makes copies of them
  3. Uses some logic to build a list of keywords for each page
  4. Uses some logic to rank pages according to specific keywords

44

45 of 49

Extracting Keywords

  1. HTML content is surrounded by various markup tags (more on that next week)
  2. You can teach the web crawler what the most important keywords are by putting them inside a select set if semantic tag:�<h1>, <title>, <main>, <article>
  3. This helps the web crawler to “learn” the structure and organization of your site:�<nav>, <ul>, <li>, <a>
  4. This is not only easier for machines: good organization benefits everyone! People, other programmers, etc.

45

46 of 49

Ordering Search Results: PageRank Example

  1. PageRank is a famous algorithm devised by Google
    • determines the relevance of a page according to popularity: The more links that point to a webpage, the more useful it will seem, and the higher it will appear in the results
  2. Other important criteria:
    • How often the page is updated; more recent often more relevant (but not always); trustworthy domain; etc.
  3. Why are search engines useful?
  4. In what ways might they be controversial?

46

47 of 49

August 2024: Big Antitrust Ruling: Google Search

“Google had paid $26.3 billion in 2021 alone to ensure that its search engine is the default on smartphones and browsers, and to keep its dominant market share.”

‘The default is extremely valuable real estate,’ Mehta wrote. ‘Even if a new entrant were positioned from a quality standpoint to bid for the default when an agreement expires, such a firm could compete only if it were prepared to pay partners upwards of billions of dollars in revenue share and make them whole for any revenue shortfalls resulting from the change.’”

Source: https://www.reuters.com/legal/us-judge-rules-google-broke-antitrust-law-search-case-2024-08-05/

48 of 49

Example of Gaming Search Engines: “Google Bombs”

Goo·gle bomb. n.

  1. an attempt to make a search term return a website for an unexpected person or organization when entered in a search engine (typically for satirical or humorous purposes) by the creation of numerous links to that website from pages including the search term.

48

49 of 49

For Friday: Assigned Readings / Materials

We will be discussing some societal issues related to the Internet and the Web. Three short required news articles (from NPR) and a bunch of optional ones:

  • Net Neutrality
  • The rollback of content moderation / fact checking on Meta Platforms
  • TikTok bans

Come to class ready to discuss!