1 of 18

Google Sheets: Importing and data prep

Adrian D. Garcia

@adriandgarcia | garcia.d.adrian@gmail.com

The Financial Times Specialist

bit.ly/3wLFoVh

2 of 18

Session Outcomes

  1. Practice importing data stored as text, extracting data from PDFs and pulling data live from HTML tables

  • Use a few key functions and features to reshape and reformat data

3 of 18

Let’s Meet Our Data

bit.ly/ire-sheets-importing

import_contributions.csv – A text file of campaign contributions

warnreport.pdf – A California WARN report

TSA checkpoint travel numbers

4 of 18

Let’s start with text files

.txt – a generic extension, could be anything

5 of 18

Let’s start with text files

.txt – a generic extension, could be anything

.csv – most common; each piece of data is separated by a comma

6 of 18

Let’s start with text files

.txt – a generic extension, could be anything

.csv – most common; each piece of data is separated by a comma

.tsv or .tab – tab-separated values

7 of 18

Let’s start with text files

.txt – a generic extension, could be anything

.csv – most common; each piece of data is separated by a comma

.tsv or .tab – tab-separated values

You’ll get custom delimiters sometimes: |~;

8 of 18

Import settings

On the Sheets file, add a new tab and name it Contributions. Go to File → Import → Upload.

Find import_contributions.csv

9 of 18

Avoid This Mistake

Not opening a CSV through import can truncate data starting with a zero such as Northeastern zip codes or ID numbers

10 of 18

Pulling in HTML tables

Works well for static tables

11 of 18

Pulling in HTML tables

=importHTML(“URL”, “table”, 1)

URL = The URL of the webpage you are trying to scrape.

table = This parameter take two options: “table” or “list”.

1 = the number of the table on the page.

12 of 18

Other resources

13 of 18

Pulling data from PDFs

14 of 18

Pulling data from PDFs

native PDFs (text is selectable)

image PDFs (basically a picture of a piece of paper, requires Optical Character Recognition to extract text)

15 of 18

Pulling data from PDFs

16 of 18

Resources for OCRs

17 of 18

Data Prep

18 of 18

Data Prep