Lecture 6

Charts; Census

DATA 8

Fall 2018

Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu)

Announcements

Data Visualization

Types of Data

All values in a column should be both the same type and be comparable to each other in some way

• Numerical — Each value is from a numerical scale
• Numerical measurements are ordered
• Differences are meaningful
• Categorical — Each value is from a fixed inventory
• May or may not have an ordering
• Categories are the same or different

“Numerical” Data

Just because the values are numbers, doesn’t mean the variable is numerical

• Census example has numerical SEX code (0, 1, and 2)

• It doesn’t make sense to perform arithmetic on these “numbers”, e.g. 1 - 0 or (0+1+2)/3 are meaningless

• The variable SEX is still categorical, even though numbers were used for the categories

Plotting Two Numerical Variables

Line graph: plot

Scatter plot : scatter

https://en.wikipedia.org/wiki/C-3PO

Anthony Daniels,

actor

Table Review

Table Structure

• A Table is a sequence of labeled columns
• Labels are strings
• Columns are arrays, all with the same length
 Name Code Area (m2) California CA 163696 Nevada NV 110567

Label

Column

Row

Table Methods

• Creating and extending tables:
• Table().with_column and Table.read_table
• Finding the size: num_rows and num_columns
• Referring to columns: labels, relabeling, and indices
• labels and relabeled; column indices start at 0
• Accessing data in a column
• column takes a label or index and returns an array
• Using array methods to work with data in columns
• item, sum, min, max, and so on
• Creating new tables containing some of the original columns:
• select, drop

(Demo)

Manipulating Rows

• t.sort(column) sorts the rows in increasing order
• t.take(row_numbers) keeps the numbered rows
• Each row has an index, starting at 0
• t.where(column, are.condition) keeps all rows for which a column's value satisfies a condition
• t.where(column, value) keeps all rows
for which a column's value equals some particular value
• t.with_row makes a new table that has another row

Lists

Lists are Generic Sequences

A list is a sequence of values (just like an array), but the values can all have different types

[2+3, 'four', Table().with_column('K', [3, 4])]

• Lists can be used to create table rows.
• If you create a table column from a list, it will be converted to an array automatically

(Demo)

Discussion Questions

The table nba has columns NAME, POSITION, and SALARY.

• Create an array containing the names of all point guards (PG) who make more than \$15M/year

b) After evaluating these two expressions in order, what's the result of the second one?

nba.with_row(['Bernie', 'Mascot', 100])

nba.where('NAME', are.containing('Bern'))

guards = nba.where('POSITION', 'PG')

guards.where('SALARY', are.above(15)).column('NAME')

Census Data

The Decennial Census

• Every ten years, the Census Bureau counts how many people there are in the U.S.

• In between censuses, the Bureau estimates how many people there are each year.

• Article 1, Section 2 of the Constitution:
• “Representatives and direct Taxes shall be apportioned among the several States … according to their respective Numbers …”

Analyzing Census Data

Leads to the discovery of interesting features and trends in the population

(Demo)

Census Table Description

• Values have column-dependent interpretations
• The SEX column: 1 is Male, 2 is Female
• The POPESTIMATE2010 column: 7/1/2010 estimate
• In this table, some rows are sums of other rows
• The SEX column: 0 is Total (of Male + Female)
• The AGE column: 999 is Total of all ages
• Numeric codes are often used for storage efficiency
• Values in a column have the same type, but are not necessarily comparable (AGE 12 vs AGE 999)

Growth Rate

• Growth rate = g (for example 3%, or 0.03)
• Initial value x, final value y after t periods of time

Value after 1 period = x + xg = x * (1+g)

Value after 2 periods = x(1+g)(1+g) = x * (1+g) ** 2

Value after t periods = y = x * (1+g) ** t

So (1+g) ** t = y/x and so 1+g = (y/x) ** (1/t)

So g = (y/x) ** (1/t) - 1

Lecture 06 – Charts, Census - Google Slides