Google Sheets: Importing and data prep
Adrian D. Garcia
@adriandgarcia | garcia.d.adrian@gmail.com
The Financial Times Specialist
bit.ly/3wLFoVh
Session Outcomes
Let’s Meet Our Data
import_contributions.csv – A text file of campaign contributions
warnreport.pdf – A California WARN report
TSA checkpoint travel numbers
Let’s start with text files
.txt – a generic extension, could be anything
Let’s start with text files
.txt – a generic extension, could be anything
.csv – most common; each piece of data is separated by a comma
Let’s start with text files
.txt – a generic extension, could be anything
.csv – most common; each piece of data is separated by a comma
.tsv or .tab – tab-separated values
Let’s start with text files
.txt – a generic extension, could be anything
.csv – most common; each piece of data is separated by a comma
.tsv or .tab – tab-separated values
You’ll get custom delimiters sometimes: |~;
Import settings
On the Sheets file, add a new tab and name it Contributions. Go to File → Import → Upload.
Find import_contributions.csv
Avoid This Mistake
Not opening a CSV through import can truncate data starting with a zero such as Northeastern zip codes or ID numbers
Pulling in HTML tables
Works well for static tables
Pulling in HTML tables
=importHTML(“URL”, “table”, 1)
URL = The URL of the webpage you are trying to scrape.
table = This parameter take two options: “table” or “list”.
1 = the number of the table on the page.
Other resources
Pulling data from PDFs
Pulling data from PDFs
native PDFs (text is selectable)
image PDFs (basically a picture of a piece of paper, requires Optical Character Recognition to extract text)
Pulling data from PDFs
Resources for OCRs
Data Prep
Data Prep