1 of 36

Arrays, NumPy, Indexing, Variables in Data Science

1

Data 6 Summer 2025

LECTURE 04

What it means to work with data.

Developed by students and faculty at UC Berkeley and Tuskegee University

2 of 36

Day 3

Announcements!

  • New midterm date still TBD
  • Homework 1 has been released!

2

3 of 36

Today’s Roadmap

Lecture 04, Data 6 Summer 2025

  1. Arrays
  2. Array Functions
  3. NumPy
  4. Indexing
  5. Variables in Data Science

3

4 of 36

Quick Check Review

4

5 of 36

Arrays

5

1. Arrays

2. Array Functions

3. NumPy

4. Indexing

5. Materials-Based Research

6 of 36

Arrays = More Values!

6

7 of 36

An Array Is a Sequential Collection of Values

7

multiple values organized together

arranged like a line/queue

Use make_array() to create arrays.

Values in an array must all be of the same data type, and Python will cast appropriately.

Array with 4 ints

Array with 4 floats

Array with 3 strs

8 of 36

An Array Is a Sequential Collection of Values

8

Arrays allow us to write code that performs computation on many pieces of data at once.

Python can assign an entire array�of values to a single name.

The order of a array is fixed (i.e., they will be arranged in the order specified when building the array), and values can be repeated.

multiple values organized together

arranged like a line/queue

Use make_array() to create arrays.

Values in an array must all be of the same data type, and Python will cast appropriately.

Array with 4 ints

Array with 4 floats

Array with 3 strs

9 of 36

Side Note: datascience Package

The datascience Python package was written by UC Berkeley specifically for data science education.

We generally put the import statement in a cell at the top of our notebook.

  • After running the import statement, we can then call package functions without prepending datascience.
  • The make_array() function�is from this package!
  • If you’re concerned about importing multiple functions with the same name from different packages, use “import datascience as ds”, and prepend ds in front of function calls.

9

from datascience import *

“Import everything from the data science package”

10 of 36

Array Operations

10

1. Arrays

2. Array Functions

3. NumPy

4. Indexing

5. Materials-Based Research

11 of 36

American Community Survey (ACS) 2020

The following table is drawn from the American Community Survey (ACS) of 2020. It shows education levels of adults 25 years or higher by state.

We show AL, CA, FL, NY, TX.

11

(Later) How is this data presented, and in what societal context was it analyzed?

(Now) How can we use arrays to analyze this data?

Estimated total state population

Estimated high school graduate or higher (%)

Estimated bachelor's degree or higher (%)

Alabama

3,344,006

86.9

26.2

California

26,665,143

83.9

34.7

Florida

15,255,326

88.5

30.5

New York

13,649,157

87.2

37.5

Texas

18,449,851

84.4

30.7

12 of 36

Compute % of Non-HS Graduates by State

12

Estimated total state population

Estimated high school graduate or higher (%)

Estimated bachelor's degree or higher (%)

Alabama

3,344,006

86.9

26.2

California

26,665,143

83.9

34.7

Florida

15,255,326

88.5

30.5

New York

13,649,157

87.2

37.5

Texas

18,449,851

84.4

30.7

hs_or_higher = make_array(86.9, 83.9, 88.5, 87.2, 84.4)

below_hs = 100 - hs_or_higher

below_hs

Demo

13 of 36

Arithmetic on Arrays:

Evaluation Returns a New Array

13

⚠️ Evaluating array expressions returns a new array; it does not change the original array.

Demo

14 of 36

Arithmetic on Arrays:

Evaluation Returns a New Array

Array Arithmetic is Element-Wise

14

⚠️ Evaluating array expressions returns a new array; it does not change the original array.

1) Arithmetic with an array and a numeric value

Demo

15 of 36

Element-Wise Arithmetic

15

This element-wise behavior works with all of the arithmetic operations you expect!

16 of 36

Estimate # Bachelor Degrees by State

16

Estimated total state population

Estimated high school graduate or higher (%)

Estimated bachelor's degree or higher (%)

Alabama

3,344,006

86.9

26.2

California

26,665,143

83.9

34.7

Florida

15,255,326

88.5

30.5

New York

13,649,157

87.2

37.5

Texas

18,449,851

84.4

30.7

bs_or_higher = make_array(26.2, 34.7, 30.5, 37.5, 30.7)

state_pop = make_array(...) # see demo

bs_or_higher / 100 * state_pop

Demo

17 of 36

Arithmetic on Arrays:

Evaluation Returns a New Array

Array Arithmetic is Element-Wise

17

⚠️ Evaluating array expressions returns a new array; it does not change the original array.

1) Arithmetic with an array and a numeric value

2) Arithmetic with two arrays of equal length (same number of values).

Demo

18 of 36

Array Functions

18

1. Arrays

2. Array Functions

3. NumPy

4. Indexing

5. Materials-Based Research

19 of 36

Standard Functions

19

Call expression format

Example(s)

len(arr)

len(str_arr) # 5�len(empty_arr) # 0

max(arr)

min(int_arr) # -4

min(arr)

max(str_arr) # 'yd'

sum(arr)

sum(int_arr) # 6

sum(str_arr) # TypeError

Compare

20 of 36

Standard Functions

20

Call expression format

Example(s)

len(arr)

len(str_arr) # 5len(empty_arr) # 0

max(arr)

max(str_arr) # 'yd'

min(arr)

min(int_arr) # -4

sum(arr)

sum(int_arr) # 6

sum(str_arr) # TypeError

While the function names are identical to what we saw for int/float/strs, the call expressions evaluate differently with our new array data type.

Compare

21 of 36

NumPy

21

1. Array Functions

2. NumPy

3. Indexing

4. Materials-Based Research

22 of 36

NumPy: A Convenient Function Library

Earlier, we computed averages using built-in Python functions:

Computing averages of array elements happens a lot in data science!

The NumPy package function np.average() is human-readable and convenient.

22

arr = make_array(30, -40, -4.5, 0, 35)

avg = sum(arr)/len(arr)

avg

In [2]:

4.1

Out [2]:

arr = make_array(30, -40, -4.5, 0, 35)

avg = np.average(arr)

avg

In [2]:

4.1

Out [2]:

23 of 36

The NumPy package

NumPy (pronounced “num pie”) is �a Python package* with convenient and�powerful functions for manipulating arrays.

23

*For our purposes, “library”, “package”, and “module” all mean similar things.

arr = make_array(30, -40, -4.5, 0, 35)

avg = np.average(arr)

avg

In [2]:

4.1

Out [2]:

import numpy as np

In [1]:

import numpy as np

Anytime we want to use NumPy, we run

We generally put this import statement at the top of our notebook,�then prepend np. to call a NumPy function.

24 of 36

Element-wise NumPy Functions

We’ll point you to NumPy functions as they come up; you don’t need to memorize them. The course website has a list of some of them.

24

NumPy functions

Many of these functions work on both arrays and individual numbers.

N-length array

N-length array

Demo

25 of 36

Common NumPy Functions

25

NumPy function

Return value

np.average(arr)

np.mean(arr)

The average (i.e., mean) value of arr

np.sum(arr)

The sum of all elements in arr

np.prod(arr)

The product of all elements in arr

np.count_nonzero(arr)

The number of elements in arr that are not equal to 0

NumPy functions

N-length array

N-length array

Demo

26 of 36

Even More Functions

26

NumPy function

Return value

np.diff(arr)

The difference between each element and the previous one value of arr

(N-1 length) array

np.cumsum(arr)

The cumulative sum of all elements in arr

np.sqrt(arr)

The square root of all elements in arr

NumPy functions

N-length array

N-length array

Demo

27 of 36

Questions About Functions?

27

Data 6 Python Reference™

28 of 36

Indexing

28

1. Array Functions

2. NumPy

3. Indexing

4. Materials-Based Research

29 of 36

Array Methods

Methods are functions that we call with “dot” syntax. There are several array methods that make it easy to calculate values of interest.

Terminology note: Method calls are where the function operates directly on the array arr.

In these examples, method calls are equivalent to the NumPy package functions.

29

The most common array method is item(), which is used for array indexing.

30 of 36

An Element’s Index Is Its Position in an Array

When people stand in a line, each person has a position.

Similarly, each element (i.e., value) of an array has a position – called its index.

Python, like most programming languages, is 0-indexed. This means that in an array, the first element has index 0, not 1.

30

Person 1

Person 7

Index 0

Index 6

Indices

0 1 2 3 4

For a length-5 array:

31 of 36

Array Indexing

We can access an element in an array by using its index and the item() method:

arr.item(index)

31

Though int_arr has 5 elements, the largest valid index is 4.

Demo

32 of 36

Negative Indexing

We can also “count backwards” using negative indexes.

  • -1 corresponds to the last element in a array.
  • -2 corresponds to the second last element in a array.
  • And so on...

32

Demo

33 of 36

Questions?

33

34 of 36

In Conclusion…

34

35 of 36

Summary

  • Array functions allow us to operate on and aggregate data in arrays
  • While some functions are built-in to Python, others come from the NumPy library, which contains many useful array methods
  • To retrieve a specific element/item in an array, you use the element’s index
    • Remember that indices start at 0, not 1

35

36 of 36

Recap

Next Time

  • Arrays
  • Array Functions
  • NumPy
  • Indexing

  • Introduction to social sciences research

36

https://tinyurl.com/data6quickcheck