1 of 65

Spreadsheet

Quick Data Analytics & Put it all together

Pili Hu

e@hupili.net

2 of 65

Homeworks and Projects

3 of 65

Homework 5 - eCharts

  • Your story?
  • Why do you pick this chart to tell this particular story?
  • What annotation do you leverage?
  • How does animation help your story telling?

4 of 65

Project

  • Project idea

5 of 65

Project

  • Theme: A story with web-based interactive visualisation
  • Deliverable:
    • Website
    • Report
    • Presentation (April 29)
  • Size of a group: 3 students
  • Topic:
    • Self proposed
    • Mentor proposed

More about projects and mentors

6 of 65

Data in the News Room

7 of 65

“Interview” data for news stories

8 of 65

Touchable infographic for the blind

9 of 65

Turn data into sounds

10 of 65

Visualise/ Gamify the data

http://salary360.initiumlab.com/#/

Anyone want to make an updated version based on Hong Kong 2016 by-census data?

11 of 65

Demand in media industry

12 of 65

A Sample JD of “Data Journalist”

Graphics Dataviz UI Skills Design Illustrator InDesign Photoshop D3 Javascript HTML CSS GIS ArcGIS QGIS ArcGIS TopoJSON Three.js WebGL PostGIS Statistics Graphics. R�

Categories:

Graphics Design, Web Dev, Mapping, Web 3D, Analysis

http://data-journalism-jobs.silk.co/page/Data-Journalist-Bloomberg-UK

13 of 65

Non Programming Data Tools

Data Visualization

  • Google Sheets
  • Google Fusion table
  • Google Drawing
  • Tableau
  • Baidu Echart
  • Infogr.am
  • Kumu
  • Netlytic
  • AI / PS

Data Collection:

  • Scraper (Chrome extension)
  • Copy and paste
  • Google Cache
  • Wayback Machine
  • Import.io

Data Cleaning

  • Google Refine
  • Google Sheets
  • PDF tables
  • Small PDF
  • Tabula

Data Analysis

  • Google spreadsheet
    • Basic manipulation
    • Function
    • Pivoting
    • Charting
    • Scripting (.gs)
  • Google fusion table

14 of 65

Programming ensures long-term competitiveness

New tools every year…

Tools can change...

Tools may not be maintained...

15 of 65

Good news

  • Google spreadsheet
    • Basic manipulation
    • Function
    • Pivoting
    • Charting
    • Scripting (.gs)
  • Google fusion table

90% work can be done by Spreadsheet!

16 of 65

Sample:

HK District Council Election Data

17 of 65

Meet the data

Data:

  • From 1999 to 2015
  • # of Candidates: 4392
    • Name, occupation, party, camp, votes
  • # of Constituencies: 2039
    • Total votes, voting rate, count of voters, population

18 of 65

Meet the sources

Methodology:

  • Automatic
    • Scraping
  • Semi-automatic:
    • Copy-and-paste a few table from the website
    • Data cleaning by human
  • Manual input from books
    • Labour intensive
  • Investigation

19 of 65

Manpower overview

Metric

Value

# of unique participants

8

Data collection/ cleaning

720 man-hours (3 months)

Data validation

24 man-hours (3 days)

Data analysis

50 man-hours (6 days)

Project span

5 months

Manpower overview of the large data collection campaign

20 of 65

Distribution of sources & time

1999

2003

2007

2011

2015

個人信息

(年齡)

手動抄書

(3)

手動抄書

(3)

手動抄書

(3)

手動抄書

(3)

自動抓取睇嘢

(0.5)

個人信息

(性別、職業)

手動抄書

(6)

手動抄書

(6)

手動抄書

(6)

區選網站/手動

(2)

區選網站/自動

(1)

政黨派別

(政黨)

手動抄書

(3)

手動抄書

(3)

手動抄書

(3)

區選網站/手動

(1)

區選網站/自動

(0.5)

政黨派別

(泛/建/其他)

起底+標註

(130)

起底+標註

(130)

起底+標註

(130)

起底+標註

(130)

起底+標註

(130)

選區信息

(居民數、選民數、投票率)

區選網站/手動

(2.1)

區選網站/手動

(2.1)

區選網站/手動

(2.1)

區選網站/手動

(2.1)

區選網站/自動

(1.1)

選舉結果

(得票率)

手動抄書

(3)

手動抄書

(3)

手動抄書

(3)

手動抄書

(3)

missing

(0)

Notation: Source (man-hour)

Research/ investigation consumes significant more time

Online accessible/ (semi-) formatted data saves time

Importance of open data and knowledge sharing

21 of 65

Hong Kong District Council (Disco)

Final output:

22 of 65

Sample: HK Disco Evolution - Camp

23 of 65

Sample: HK Disco Evolution - Gender

24 of 65

Sample: HK Disco Evolution - Age

25 of 65

Camp Evolution on Map

Produced by Google Fusion table. Not covered in this workshop

26 of 65

Camp Evolution on Map - Sham Shui Po

Produced by Google Fusion table. Not covered in this workshop

27 of 65

Get the data

28 of 65

Spreadsheet

(solves 90% problems)

29 of 65

30 of 65

Basic Cell Operation

  • Cell operation
  • Cell naming
    • First alphabet is column name
    • Second digit is row number
    • e.g. A1, C3, ...

31 of 65

Keyboard Shortcuts

https://support.google.com/docs/answer/181110?hl=en

Learn by use.

Can come back to this page later.

32 of 65

Auto Fill

  • Relative & Absolute cell locationing
    • Use “$” to lock column or row
  • Data type and fill
    • Try date type for example

33 of 65

Formula

  • Arithmetic
    • + - * /
    • SUM(), AVERAGE(), MIN(), MAX()
  • String
    • LEFT(), MID(), RIGHT(), ...
    • CONCATENATE()
    • SPLIT()
  • Others:
    • COUNTIF()
    • IF()
    • VLOOKUP()

34 of 65

Find help

2. Live help document

35 of 65

JOIN()

36 of 65

IF()

37 of 65

Random Sampling

  • =RANDOM()
  • Sort
  • Clip # of elements you want

38 of 65

Conditional Formatting

39 of 65

Charting

  • Insert chart
  • Modify chart meta data
  • Embed chart

40 of 65

Pivot Table

Key concepts:

  • Group by row, column
    • Grouping can be nested
  • Filter records
  • Aggregate grouped and filtered records into values
  • Sort groups

41 of 65

Pivot Table - flexible arrangements

42 of 65

Group Exercise

43 of 65

Group Exercise

Use what learned so far to reproduce the sample and try to go beyond

44 of 65

Quizzes and Tricks for Table

45 of 65

Cross Tab Reference

  • Usage: =’Sheet 1’!A1
  • Common case:
    • Usually you don’t edit pivot table results directly because it is automatically generated
    • Use cross-tab reference to copy the whole result table to another Tab

46 of 65

Edit filtered values?

Common practice:

  • Add a new column for (very complex) filter criterion
  • Use pivot table to extract by the criterion column
  • Use cross-tab reference to copy pivot result to a new working area

Live filter:

  • Good to explore interactively
  • Persistence is an issue: next time someone might filter to different criteria

47 of 65

VLOOKUP()

48 of 65

Horizontal lookup?

  • VLOOKUP is for vertical lookup
  • Try to tweak for horizontal lookup

49 of 65

General parameters change upon auto-fill?

Question:

  • When using auto-fill function, cell-ID (e.g. “C3”) will change according to the auto-fill direction.
  • What if you need other parameters to change?
    • E.g. for first 100 rows, multiply by 2, for next 100 rows, multiply by 3, ..

Answer:

  • Use CELL formula to retrieve the column or row number.

50 of 65

Lists (columns) intersection/ difference

Use COUNTIF

51 of 65

Quick auto-fill

Problem:

  • Dragging is time consuming and error prone when there are many rows and columns.

Solution:

  • Select the cells you want to auto-fill and press Command+D (Ctrl+D)

52 of 65

IMAGE()

53 of 65

SPLIT()

=SPLIT()

54 of 65

Time functions

  • TODAY()
  • NOW()

55 of 65

Add multiple rows/columns

  • Highlight the number of rows or columns you want to add
  • Right click and do “Insert N columns/ rows …”

56 of 65

Random Sampling

  • =RANDOM()
  • Sort
  • Clip # of elements you want

57 of 65

Error handling

Error is common during computation:

  • Divide by zero
  • Missing value
  • Ill-formatted values

However, your data workflow should continue.

Solution:

  • =IFERROR()

58 of 65

TRANSPOSE()

59 of 65

Keywords Counting

  • SPLIT()
  • TRANSPOSE()
  • pivot table

60 of 65

Custom function

1

2

3

Basically you can do anything with Javascript.

https://developers.google.com/apps-script/guides/sheets/functions

61 of 65

Recap

62 of 65

Recap

  • Spreadsheet
    • Pivot table
  • Web-based visualisation
  • End-to-end production

63 of 65

Common Questions

64 of 65

Less (color) is more (story)

After

Before

65 of 65

Homework review & class discussion