SICSS Atlanta 2022
Announcements
The Plan for Today
SICSS Sample Project
Day 2
Trying to find a new apartment
Trying to find a new apartment
Where is the Data From?
Trying to find a new apartment
Information Asymmetries Housing Markets
counts
counts
rates
Total Sample
14
Counts
15
Counts
Rates
16
Results
17
What happened after SICSS
Digital Trace Data
Day 2
Readymades vs. Custommades
Cleo the Clownfish from the Shedd Aquarium
Readymades vs. Custommades
Cleo the Clownfish from the Shedd Aquarium
What is digital trace data
Strengths of digital trace data
Weaknesses of digital trace data
Weaknesses of digital trace data
Application Programming Interfaces
Application Programming Interfaces
Testing some easy APIs where you control the call
Testing some easy APIs where you control the call
Testing some easy APIs where you control the call
R Packages for APIs
Throttling & Rate Limiting
Rate limiting: client-side response the maximum capacity of a channel
Throttling: server-side response providing feedback to the caller that there are too many requests coming in from that client or that the server is overloaded
Screen scraping basics in R
Screen-Scraping
Adapted from SICSS Day 2
Is it Illegal?
Let’s Try
Setting up the R Environment
install.packages("rvest")
install.packages("selectr")
Tell R we want to use those packages now
Find what we want to scrape
Read this source code into R
The Website is now in R…�but we need to parse the html file
Right click on the part of the webpage you want to scrape and choose ‘inspect’
Right click inside the developer window and select copy then copy Xpath
Now we can use this information to point R in the right direction
Put the information back into Table form
Complications
Parsing with a CSS Selector
Scraping Duke’s Mainpage (duke.edu)
Try clicking around the site with Selector Gadget to identify the xpath
Feed the information to R
Other Complications