Spreadsheet
Quick Data Analytics & Put it all together
Homeworks and Projects
Homework 5 - eCharts
Project
Project
More about projects and mentors
Data in the News Room
“Interview” data for news stories
Touchable infographic for the blind
Turn data into sounds
Visualise/ Gamify the data
http://salary360.initiumlab.com/#/
Anyone want to make an updated version based on Hong Kong 2016 by-census data?
Demand in media industry
http://data-journalism-jobs.silk.co/
Updated to 2015
A Sample JD of “Data Journalist”
Graphics Dataviz UI Skills Design Illustrator InDesign Photoshop D3 Javascript HTML CSS GIS ArcGIS QGIS ArcGIS TopoJSON Three.js WebGL PostGIS Statistics Graphics. R�
Categories:
Graphics Design, Web Dev, Mapping, Web 3D, Analysis
http://data-journalism-jobs.silk.co/page/Data-Journalist-Bloomberg-UK
Non Programming Data Tools
Data Visualization
Data Collection:
Data Cleaning
Data Analysis
Programming ensures long-term competitiveness
New tools every year…
Tools can change...
Tools may not be maintained...
Good news
90% work can be done by Spreadsheet!
Sample:
HK District Council Election Data
Meet the data
Data:
Meet the sources
Methodology:
Manpower overview
Metric | Value |
# of unique participants | 8 |
Data collection/ cleaning | 720 man-hours (3 months) |
Data validation | 24 man-hours (3 days) |
Data analysis | 50 man-hours (6 days) |
Project span | 5 months |
Manpower overview of the large data collection campaign
Distribution of sources & time
| 1999 | 2003 | 2007 | 2011 | 2015 |
個人信息 (年齡) | 手動抄書 (3) | 手動抄書 (3) | 手動抄書 (3) | 手動抄書 (3) | 自動抓取睇嘢 (0.5) |
個人信息 (性別、職業) | 手動抄書 (6) | 手動抄書 (6) | 手動抄書 (6) | 區選網站/手動 (2) | 區選網站/自動 (1) |
政黨派別 (政黨) | 手動抄書 (3) | 手動抄書 (3) | 手動抄書 (3) | 區選網站/手動 (1) | 區選網站/自動 (0.5) |
政黨派別 (泛/建/其他) | 起底+標註 (130) | 起底+標註 (130) | 起底+標註 (130) | 起底+標註 (130) | 起底+標註 (130) |
選區信息 (居民數、選民數、投票率) | 區選網站/手動 (2.1) | 區選網站/手動 (2.1) | 區選網站/手動 (2.1) | 區選網站/手動 (2.1) | 區選網站/自動 (1.1) |
選舉結果 (得票率) | 手動抄書 (3) | 手動抄書 (3) | 手動抄書 (3) | 手動抄書 (3) | missing (0) |
Notation: Source (man-hour)
Research/ investigation consumes significant more time
Online accessible/ (semi-) formatted data saves time
Importance of open data and knowledge sharing
Hong Kong District Council (Disco)
Final output:
Sample: HK Disco Evolution - Camp
Sample: HK Disco Evolution - Gender
Sample: HK Disco Evolution - Age
Camp Evolution on Map
Produced by Google Fusion table. Not covered in this workshop
Camp Evolution on Map - Sham Shui Po
Produced by Google Fusion table. Not covered in this workshop
Get the data
Spreadsheet
(solves 90% problems)
Basic Cell Operation
Keyboard Shortcuts
Auto Fill
Formula
Find help
1. Function list:
https://support.google.com/docs/table/25273?visit_id=1-636249304870053806-836666939&hl=en&rd=2
2. Live help document
JOIN()
IF()
Random Sampling
Conditional Formatting
Charting
Pivot Table
Key concepts:
Pivot Table - flexible arrangements
Group Exercise
Group Exercise
Use what learned so far to reproduce the sample and try to go beyond
Quizzes and Tricks for Table
Cross Tab Reference
Edit filtered values?
Common practice:
Live filter:
VLOOKUP()
Horizontal lookup?
General parameters change upon auto-fill?
Question:
Answer:
Lists (columns) intersection/ difference
Use COUNTIF
Quick auto-fill
Problem:
Solution:
IMAGE()
SPLIT()
=SPLIT()
Time functions
Add multiple rows/columns
Random Sampling
Error handling
Error is common during computation:
However, your data workflow should continue.
Solution:
TRANSPOSE()
Keywords Counting
Custom function
1
2
3
Basically you can do anything with Javascript.
https://developers.google.com/apps-script/guides/sheets/functions
Recap
Recap
Common Questions
Less (color) is more (story)
After
Before
Homework review & class discussion