1 of 20

Lecture 3

Expressions

DATA 8

Summer 2017

Slides created by John DeNero (denero@berkeley.edu) and Sam Lau (samlau95@berkeley.edu)

2 of 20

Announcements

3 of 20

Review: Cause & Effect

4 of 20

Comparison

  • Group by some treatment and measure some outcome
  • Simplest setting: a treatment group and a control group
  • If the outcome differs between these two groups, �that's evidence of an association (or relation)
    • E.g., the top-tier chocolate eaters died of heart disease at a lower rate (12%) than chocolate abstainers (17%)
  • If the two groups are similar in all ways but the treatment, �a difference in the outcome is also evidence of causality

5 of 20

Confounding

  • If the treatment and control groups have systematic differences other than the treatment itself, then it might be difficult to identify a causal link
  • When these systematic differences lead researchers astray, they are called confounding factors
  • Such differences are often present in observational studies
    • Observational study: the researcher does not choose which subjects receive the treatment
    • Controlled experiment: the researcher designs a procedure for selecting the treatment and control groups

6 of 20

Randomize!

  • When subjects are split up randomly, it's unlikely that there will be systematic differences between the groups
  • And it's possible to account for the chance of a difference
  • Therefore, randomized controlled experiments are the most reliable way to establish causal relations

7 of 20

Expressions

8 of 20

Programming Languages

  • Python is popular both for data science & �general software development
  • Mastering the language fundamentals is critical
  • Learn through practice, not by reading or listening
  • Follow along: data8.haas.berkeley.edu

(Demo)

9 of 20

Arithmetic Operators

Operation

Operator

Example

Value

Addition

+

2 + 3

5

Subtraction

-

2 - 3

-1

Multiplication

*

2 * 3

6

Division

/

7 / 3

2.66667

Remainder

%

7 % 3

1

Exponentiation

**

2 ** 0.5

1.41421

10 of 20

Example: Slopes

11 of 20

12 of 20

Much better

(Demo)

13 of 20

Numbers

(Demo)

14 of 20

Ints and Floats

Python has two real number types

  • int: an integer of any size
  • float: a number with an optional fractional part

An int never has a decimal point; a float always does

A float might be printed using scientific notation

Three limitations of float values:

  • They have limited size (but the limit is huge)
  • They have limited precision of 15-16 decimal places
  • After arithmetic, the final decimal few places can be wrong

15 of 20

Discussion Question

Rank the results of the following expressions in order from least to greatest

  1. 3 * 10 ** 10
  2. 10 * 3 ** 10
  3. (10 * 3) ** 10
  4. 10 / 3 / 10
  5. 10 / (3 / 10)
  • 30000000000
  • 590490
  • 590490000000000
  • 0.33333333333333337
  • 33.333333333333336

16 of 20

Names

17 of 20

Assignment Statements

  • Statements don't have a value; they perform an action
  • An assignment statement changes the meaning of the name to the left of the = symbol
  • The name is bound to a value (not an equation)

more_than_1 = 2 + 3

Name

Any expression

(Demo)

18 of 20

Exponential Growth

19 of 20

Growth Rate

  • The rate of increase per unit time
  • After one time unit, a quantity x growing at rate g will be

x * (1 + g)

  • After t time units, a quantity x growing at rate g will be

x * (1 + g) ** t

  • If after and before are measurements of the same quantity taken t time units apart, then the growth rate is

(after/before) ** (1/t) - 1

(Demo)

20 of 20

Ebola Epidemic, Sept. 2014

Source: Columbia Prediction of Infectious Diseases, World Health Organization

"It's spreading and growing exponentially," President Obama said.

"This is a disease outbreak that is advancing in an exponential fashion," said Dr. David Nabarro, who is heading the U.N.'s effort against Ebola.

A Frightening Curve: �How Fast Is The Ebola Outbreak Growing?