Lecture 5

Building Tables

DATA 8

Fall 2019

Lecture 5

Building Tables

DATA 8

Fall 2019

Announcements

Weekly Goals

- Today:
- Creating tables from scratch
- Manipulating columns as arrays

- Later this week
- Table review
- Visualizing data
- Working with Census data
- Distributions

Arrays

Arrays

An array contains a sequence of values

- All elements of an array should have the same type
- Arithmetic is applied to each element individually
- Adding arrays adds elements (if same length!)
- A column of a table is an array

(Demo)

Ranges

A range is an array of consecutive numbers

- np.arange(end):

An array of increasing integers from 0 up to end - np.arange(start, end):

An array of increasing integers from start up to end - np.arange(start, end, step):

A range with step between consecutive values

The range always includes start but excludes end

Ways to create a table

- Table.read_table(filename) - reads a table from a spreadsheet
- Table() - an empty table

- and… select, where, sort and so on all create new tables

Example

Charles Joseph Minard, 1781-1870

- French civil engineer who created one of the greatest graphs of all time
- Visualized Napoleon's 1812 invasion of Russia, including
- the number of soldiers
- the direction of the march
- latitude and longitude
- temperature on the return journey
- dates in November and December

Some of Minard’s Data

(Demo)

Discussion Question

Use the table functions we learned last week to find the southernmost city along the army’s retreat.

Table Methods

- Creating and extending tables:
- Table().with_column and Table.read_table
- Finding the size: num_rows and num_columns
- Referring to columns: labels, relabeling, and indices
- labels and relabeled; column indices start at 0
- Accessing data in a column
- column takes a label or index and returns an array
- Using array methods to work with data in columns
- item, sum, min, max, and so on
- Creating new tables containing some of the original columns:
- select, drop

(Demo)

Manipulating Rows

- t.sort(column) sorts the rows in increasing order
- t.take(row_numbers) keeps the numbered rows
- Each row has an index, starting at 0
- t.where(column, are.condition) keeps all rows for which a column's value satisfies a condition
- t.where(column, value) keeps all rows containing a certain value in a column