Data Types
CSCI 104: Understanding
Data Through Computation
Williams College�Fall 2022
Announcements
Topics
We’re building data science skills to answer questions from real data
Today’s domain: Art!
Question you should be able to answer by the end of today’s lecture:
How much did someone pay (in today’s US dollars) for this painting at an auction house in 1804?
Review: Basic Table Operations
Our Handy Reference: https://www.cs.williams.edu/~cs104/auto/python-library-ref.html
Operation | Description |
t.select(label) | Creates a new table with just the specified columns |
t.drop(label) | Creates a new table in which the specified columns are omitted |
t.sort(label) | Creates a new table with rows sorted by the specified column |
t.where(label, condition) | Creates a new table with just the rows that match the condition |
t.barh(categories, values) | Displays a bar chart with bars for each category having the height indicated by the values column |
1. Tables: Art sales in the UK
Review Arithmetic Operations
Operation | Operator | Example | Value |
Addition | + | 2 + 3 | 5 |
Subtraction | - | 2 - 3 | -1 |
Multiplication | * | 2 * 3 | 6 |
Division | / | 7 / 3 | 2.667 |
Exponentiation | ** | 2 ** 0.5 | 1.4121 |
Remainder | % | 7 % 3 | 1 |
New!
Basic Types of Values in Python
Type | Description | Examples |
int | Integers | 0, 13, -4 |
float | Real-valued numbers (will have a decimal) | 0.0, 13.444, -4.2 |
string | Characters, words, phrases, text | "hello", "goodbye" |
boolean | Can only take two values | True, False |
String Operations
Operation | Operator | Example | Value |
Concatenation | + | "cs" + "104" | "cs104" |
Convert value to string | str() | str(2.1) | "2.1" |
Convert string to number | int() float() | int("2") float("2.1") | 2 2.1 |
2. Data Types
Table
not_a_painting.show()
Another table operation
Operation | Description |
t.select(label) | Creates a new table with just the specified columns |
t.drop(label) | Creates a new table in which the specified columns are omitted |
… | … |
t.column(column_name_or_index) | Creates a new array with only the specified column |
New!
Arrays
16
22
51
0
0
56
All items of an array must be the same type (e.g. int)
not_a_painting.column("pounds")
A selected column of a table is an array
Arrays
not_a_painting.column("pounds")
16
22
51
0
0
1
2
3
0
56
4
5
Index of item in array
Value of item in array
Note: In computer science we start with index 0 (not 1)!
Making Arrays From Scratch
fives = make_array(5, 10, 15, 20, 25, 30)
5
10
15
20
25
30
Broadcast Operations
not_a_painting.column("pounds") + 5
16
22
51
0
0
21
27
56
5
5
+ 5
56
61
Accessing Items in an Array
not_a_painting.item(3)
16
22
51
0
0
1
2
3
0
56
4
5
0
Broadcast Operations
not_a_painting.column("pounds") + fives
5
10
15
20
25
30
+
21
32
66
20
25
86
16
22
51
0
0
56
Another table operation
Operation | Description |
t.select(label) | Creates a new table with just the specified columns |
t.drop(label) | Creates a new table in which the specified columns are omitted |
… | … |
t.take(row_indices) | Creates table with only the rows at the given indices. |
New!
Ranges
np.arange(0, 5)
0
1
2
3
4
3. Arrays
Think-pair-share: Make a work plan
Operations |
t.select(label) |
t.drop(label) |
t.sort(label) |
t.where(label, condition) |
t.barh(categories, values) |
t.column(column_name_or_index) |
t.take(row_indices) |
Q: What are the top 5 paintings by the artist Vandyck ranked by highest price (in pounds)?
4. Applied questions