1 of 27

Scraping with simple programming

2 of 27

This is a follow-up to my tutorial "Scraping without programming"

You may want to check that one out first.

3 of 27

What is Google Apps Script?

Google Apps Script is code (technically JavaScript) that lives inside your Google tools. Like:

  • Google Sheets
  • Gmail
  • Google Docs
  • Google Calendar
  • Google Forms
  • Google Contacts

4 of 27

Check out a few examples!

5 of 27

There are also third-party apps for this...

But learning the actual code frees you from their limitations.

6 of 27

We're going to scrape the web with Google Sheets

7 of 27

Create a blank Google Sheet

and select Tools > Script editor…

This is where you will write your code!

8 of 27

the beauty of Google Apps Script

You don't have to download or install any programming languages, libraries, third-party apps, etc.

This code is "bound" to this particular Google Sheet - meaning it will only affect this one Sheet, ever.

You don't need a separate bit of code to run it on a timer

You can put your data right into a usable spreadsheet, as opposed to a csv

9 of 27

It starts with an empty function

10 of 27

We're going to start with this sample code

Copy and paste it into your script, replacing the blank function.

Click Save and then Debug (the little bug icon).

Debugging lets you see the results of your script without "running" it, which would affect your Google Sheet.

11 of 27

Error!

You probably got the error message "ReferenceError: "getElementsByClassName" is not defined."

That's because pretty much any coding script in the world needs other scripts to rely on. It's dependent on them.

12 of 27

In this case, the "getElementsByClassName" function is not defined - that means the computer doesn't know was it is. We have to tell the computer by copying and pasting the other bits of code on the tutorial:

"getElementById()", "getElementsByClassName()", and "getElementsByTagName()"

Copy and paste those three into your script.

Click Save and then Debug again.

13 of 27

What happens?

NOTHING!

Or, at least, it should. A few things to note:

The "getElementBy" snippets should come before the doGet function. This is because the computer reads things top to bottom, like you and I do.

You might need to select a function to debug - you can do this via the dropdown menu next to the debug button.

14 of 27

Debugging and finding nothing means you have no bugs!

15 of 27

Now try Running the function

Click Save and then click the Play button, to the left of the bug.

Flip back to the Google Sheet that contains this script - it should be open in another tab.

See anything?

16 of 27

Let's change our last functions

Go here and copy and paste this new doGet() function, replacing the old one.

It also adds another function, writetospreadsheet(), that actually adds all this stuff to the spreadsheet.

Save it and Run it, making sure "writetospreadsheet" is the function selected in the dropdown menu.

Now look at your sheet - it has data in it!

17 of 27

Set a timer

You can click the little Clock button, next to the Save button, to run the script automatically at a certain time or based on another trigger.

18 of 27

So what did we do?

19 of 27

20 of 27

Let's change the 'a' tag

Instead of "a", try "b", "span" or "td"

Remember, the HTML Dictionary is your friend.

What do we have available? Maybe check the source code.

21 of 27

Changing the "a" to "td" gives us all the table data - but not in the right columns!

22 of 27

You need an array for that

Copy this new code, replacing the earlier doGet() and writetospreadsheet(). This one outputs your data in an array.

Use this code: https://github.com/samanthasunne/scrape-ire-job-array

23 of 27

Last one:

Let's try an RSS Feed

24 of 27

This script scrapes an RSS Feed, rather than a webpage, similar to the =importRSS function in Google Sheets.

25 of 27

26 of 27

Get all the scripts here:

27 of 27

For more tools

I reviewed OutWit Hub and DownThemAll for my newsletter, Tools for Reporters. You can find new, useful tools there - scraping and otherwise - every week.

For questions, contact me at samanthasunne@gmail.com or @samanthasunne. If I don’t know an answer, I can point you to one.