1 of 23

Data Types

CSCI 104: Understanding

Data Through Computation

Williams College�Fall 2022

2 of 23

Announcements

  • Prelab 2 due Monday
  • Lab 2 is available
    • Already have most of what you'll need to answer all the questions
  • Questions on the material so far?

3 of 23

Topics

  • Table operations review
  • Data types and casting
  • Arrays and array broadcasting

4 of 23

We’re building data science skills to answer questions from real data

Today’s domain: Art!

Question you should be able to answer by the end of today’s lecture:

How much did someone pay (in today’s US dollars) for this painting at an auction house in 1804?

5 of 23

Review: Basic Table Operations

Operation

Description

t.select(label)

Creates a new table with just the specified columns

t.drop(label)

Creates a new table in which the specified columns are omitted

t.sort(label)

Creates a new table with rows sorted by the specified column

t.where(label, condition)

Creates a new table with just the rows that match the condition

t.barh(categories, values)

Displays a bar chart with bars for each category having the height indicated by the values column

6 of 23

1. Tables: Art sales in the UK

7 of 23

Review Arithmetic Operations

Operation

Operator

Example

Value

Addition

+

2 + 3

5

Subtraction

-

2 - 3

-1

Multiplication

*

2 * 3

6

Division

/

7 / 3

2.667

Exponentiation

**

2 ** 0.5

1.4121

Remainder

%

7 % 3

1

New!

8 of 23

Basic Types of Values in Python

Type

Description

Examples

int

Integers

0, 13, -4

float

Real-valued numbers (will have a decimal)

0.0, 13.444, -4.2

string

Characters, words, phrases, text

"hello", "goodbye"

boolean

Can only take two values

True, False

9 of 23

String Operations

Operation

Operator

Example

Value

Concatenation

+

"cs" + "104"

"cs104"

Convert value to string

str()

str(2.1)

"2.1"

Convert string to number

int()

float()

int("2")

float("2.1")

2

2.1

10 of 23

2. Data Types

11 of 23

Table

not_a_painting.show()

12 of 23

Another table operation

Operation

Description

t.select(label)

Creates a new table with just the specified columns

t.drop(label)

Creates a new table in which the specified columns are omitted

t.column(column_name_or_index)

Creates a new array with only the specified column

New!

13 of 23

Arrays

16

22

51

0

0

56

All items of an array must be the same type (e.g. int)

not_a_painting.column("pounds")

A selected column of a table is an array

14 of 23

Arrays

not_a_painting.column("pounds")

16

22

51

0

0

1

2

3

0

56

4

5

Index of item in array

Value of item in array

Note: In computer science we start with index 0 (not 1)!

15 of 23

Making Arrays From Scratch

fives = make_array(5, 10, 15, 20, 25, 30)

5

10

15

20

25

30

16 of 23

Broadcast Operations

not_a_painting.column("pounds") + 5

16

22

51

0

0

21

27

56

5

5

+ 5

56

61

17 of 23

Accessing Items in an Array

not_a_painting.item(3)

16

22

51

0

0

1

2

3

0

56

4

5

0

18 of 23

Broadcast Operations

not_a_painting.column("pounds") + fives

5

10

15

20

25

30

+

21

32

66

20

25

86

16

22

51

0

0

56

19 of 23

Another table operation

Operation

Description

t.select(label)

Creates a new table with just the specified columns

t.drop(label)

Creates a new table in which the specified columns are omitted

t.take(row_indices)

Creates table with only the rows at the given indices.

New!

20 of 23

Ranges

np.arange(0, 5)

0

1

2

3

4

21 of 23

3. Arrays

22 of 23

Think-pair-share: Make a work plan

Operations

t.select(label)

t.drop(label)

t.sort(label)

t.where(label, condition)

t.barh(categories, values)

t.column(column_name_or_index)

t.take(row_indices)

Q: What are the top 5 paintings by the artist Vandyck ranked by highest price (in pounds)?

23 of 23

4. Applied questions