1 of 17

Tutorial 1

CSCI 344: Analyze an Internet Platform

Fall 2024

2 of 17

Announcements

  1. Tutorial 1 – submit to the Moodle by Monday, 8/26 @ 11:59PM.
  2. HW1 posted – submit to the Moodle by Wednesday, 9/4 @ 11:59PM.
    • You will be presenting your work in two weeks (Friday) during class
  3. Next Week we will:
    • Learn about and write some HTML
    • Install and configure VS Code (code editor), Git and GitHub, Node.js, and the V8 JavaScript engine
    • Configure some code formatting tools to validate your HTML and encourage clean coding practices

Please bring your laptop!

3 of 17

Outline

  1. How does a search engine work?
  2. What are cookies?
  3. Tutorial 1: Understanding Internet Tracking

4 of 17

Outline

  • How does a search engine work?
  • What are cookies?
  • Tutorial 1: Understanding Internet Tracking

5 of 17

What is a search engine?

A system for organizing and retrieving Web pages. Search engines perform three basic tasks:

  1. Crawling
    • Visits a page and makes a copy of it; stores both the link and the contents of the webpage somewhere (i.e. in a database)
    • Reads all of the link URLs on that page and then visits those pages and makes copies of them
  2. Indexing – where it is analysed and stored in huge databases; keywords, context, and other metadata about a website stored for easy retrieval later on; and
  3. Retrieval – where a user query fetches a list of relevant pages and sorts them.

6 of 17

Can you prevent your website from being crawled?

  • How would you do this?
    • robots.txt
  • Why would you want to do this?
  • Why would you not want to do this?

7 of 17

Ordering Search Results: PageRank Example

  1. PageRank is a famous algorithm devised by Google
    • determines the relevance of a page according to popularity: The more links that point to a webpage, the more useful it will seem, and the higher it will appear in the results
  2. Other important criteria:
    • How often the page is updated; more recent often more relevant (but not always); trustworthy domain; etc.
  3. Why are search engines useful?
  4. In what ways might they be controversial?

7

8 of 17

Example of Gaming Search Engines: “Google Bombs”

Goo·gle bomb. n.

  1. an attempt to make a search term return a website for an unexpected person or organization when entered in a search engine (typically for satirical or humorous purposes) by the creation of numerous links to that website from pages including the search term.

8

9 of 17

Extracting Keywords & Search Engine Optimization

  1. HTML content is surrounded by various markup tags (more on that next week)
  2. You can teach the web crawler what the most important keywords are by putting them inside a select set if semantic tag:�<h1>, <title>, <main>, <article>
  3. This helps the web crawler to “learn” the structure and organization of your site:�<nav>, <ul>, <li>, <a>
  4. This is not only easier for machines: good organization benefits everyone! People, other programmers, etc.

9

10 of 17

August 2024: Big Antitrust Ruling: Google Search

“Google had paid $26.3 billion in 2021 alone to ensure that its search engine is the default on smartphones and browsers, and to keep its dominant market share.”

‘The default is extremely valuable real estate,’ Mehta wrote. ‘Even if a new entrant were positioned from a quality standpoint to bid for the default when an agreement expires, such a firm could compete only if it were prepared to pay partners upwards of billions of dollars in revenue share and make them whole for any revenue shortfalls resulting from the change.’”

Source: https://www.reuters.com/legal/us-judge-rules-google-broke-antitrust-law-search-case-2024-08-05/

11 of 17

Outline

  • How does a search engine work?
  • What are cookies?
  • Tutorial 1: Understanding Internet Tracking

12 of 17

Cookies

Cookies are small bits of text that a website can store on your local computer.

  1. Traditionally, cookies have been used by the site you access (first-party cookies) to preserve some contextual information about your preferences to enhance user experience.
  2. Third-party cookies are created by domains other than the one you are visiting directly.
    1. Used by data brokers and ad networks to gather behavioral data
    2. Live chat popups
    3. By Social media buttons embedded in the website

13 of 17

Planet Money Podcast: How the cookie became a monster

  • What stood out to you?
  • Why did Lou Montulli create the cookie while working at Netscape?
  • What is DoubleClick? What did DoubleClick do?
  • How do third party cookies work?

14 of 17

14

Browser’s Local Storage / Cookies

Information Services Website

Google

- Google-assigned user id

- Other metadata

Online shopping Website

Amazon

- Amazon-assigned user id

- Other metadata

X

- Third Party user id

- Third Party user id

Banner (Ad)

Tracking Company

Third Party Cookies

Banner (Ad)

15 of 17

An effort to move away from third party cookies...

  • Safari and Firefox at various stages of banning third-party cookies
  • Google was planning to remove a widely used tracking technology from its Chrome web browser – despite complaints from rivals that rely on it to target ads at individuals.

“Google cited positive test results for a technology that analyzes users’ browsing habits on their own devices, without sending sensitive data to central servers, and said it expects to open outside testing of ad buys using the technology in the second quarter.”

  • Keeps pushing back the date for sunsetting cookies
  • Unclear that new tracking technologies will be any better

16 of 17

Bottom Line

“The debate over third-party cookies underscores a dilemma when it comes to regulating big tech companies: Protecting user privacy and promoting online competition can sometimes be at odds because one of tech’s most popular business models is targeting advertising at individuals based on their online behavior.”

17 of 17

Tutorial 1: