Arrays, NumPy, Indexing, Variables in Data Science
1
Data 6 Summer 2025
LECTURE 04
What it means to work with data.
Developed by students and faculty at UC Berkeley and Tuskegee University
Day 3
Announcements!
2
Today’s Roadmap
Lecture 04, Data 6 Summer 2025
3
Quick Check Review
4
➤
Arrays
5
1. Arrays
2. Array Functions
3. NumPy
4. Indexing
5. Materials-Based Research
➤
Arrays = More Values!
6
An Array Is a Sequential Collection of Values
7
multiple values organized together
arranged like a line/queue
Use make_array() to create arrays.
Values in an array must all be of the same data type, and Python will cast appropriately.
Array with 4 ints
Array with 4 floats
Array with 3 strs
An Array Is a Sequential Collection of Values
8
Arrays allow us to write code that performs computation on many pieces of data at once.
Python can assign an entire array�of values to a single name.
The order of a array is fixed (i.e., they will be arranged in the order specified when building the array), and values can be repeated.
multiple values organized together
arranged like a line/queue
Use make_array() to create arrays.
Values in an array must all be of the same data type, and Python will cast appropriately.
Array with 4 ints
Array with 4 floats
Array with 3 strs
Side Note: datascience Package
The datascience Python package was written by UC Berkeley specifically for data science education.
We generally put the import statement in a cell at the top of our notebook.
9
from datascience import *
“Import everything from the data science package”
Array Operations
10
1. Arrays
2. Array Functions
3. NumPy
4. Indexing
5. Materials-Based Research
➤
American Community Survey (ACS) 2020
The following table is drawn from the American Community Survey (ACS) of 2020. It shows education levels of adults 25 years or higher by state.
We show AL, CA, FL, NY, TX.
11
(Later) How is this data presented, and in what societal context was it analyzed?
(Now) How can we use arrays to analyze this data?
| Estimated total state population | Estimated high school graduate or higher (%) | Estimated bachelor's degree or higher (%) |
Alabama | 3,344,006 | 86.9 | 26.2 |
California | 26,665,143 | 83.9 | 34.7 |
Florida | 15,255,326 | 88.5 | 30.5 |
New York | 13,649,157 | 87.2 | 37.5 |
Texas | 18,449,851 | 84.4 | 30.7 |
Compute % of Non-HS Graduates by State
12
| Estimated total state population | Estimated high school graduate or higher (%) | Estimated bachelor's degree or higher (%) |
Alabama | 3,344,006 | 86.9 | 26.2 |
California | 26,665,143 | 83.9 | 34.7 |
Florida | 15,255,326 | 88.5 | 30.5 |
New York | 13,649,157 | 87.2 | 37.5 |
Texas | 18,449,851 | 84.4 | 30.7 |
hs_or_higher = make_array(86.9, 83.9, 88.5, 87.2, 84.4)
below_hs = 100 - hs_or_higher
below_hs
Demo
Arithmetic on Arrays:
Evaluation Returns a New Array
13
⚠️ Evaluating array expressions returns a new array; it does not change the original array.
Demo
Arithmetic on Arrays:
Evaluation Returns a New Array
Array Arithmetic is Element-Wise
14
⚠️ Evaluating array expressions returns a new array; it does not change the original array.
1) Arithmetic with an array and a numeric value
Demo
Element-Wise Arithmetic
15
This element-wise behavior works with all of the arithmetic operations you expect!
Estimate # Bachelor Degrees by State
16
| Estimated total state population | Estimated high school graduate or higher (%) | Estimated bachelor's degree or higher (%) |
Alabama | 3,344,006 | 86.9 | 26.2 |
California | 26,665,143 | 83.9 | 34.7 |
Florida | 15,255,326 | 88.5 | 30.5 |
New York | 13,649,157 | 87.2 | 37.5 |
Texas | 18,449,851 | 84.4 | 30.7 |
bs_or_higher = make_array(26.2, 34.7, 30.5, 37.5, 30.7)
state_pop = make_array(...) # see demo
bs_or_higher / 100 * state_pop
Demo
Arithmetic on Arrays:
Evaluation Returns a New Array
Array Arithmetic is Element-Wise
17
⚠️ Evaluating array expressions returns a new array; it does not change the original array.
1) Arithmetic with an array and a numeric value
2) Arithmetic with two arrays of equal length (same number of values).
Demo
Array Functions
18
➤
1. Arrays
2. Array Functions
3. NumPy
4. Indexing
5. Materials-Based Research
Standard Functions
19
Call expression format | Example(s) |
len(arr) | len(str_arr) # 5�len(empty_arr) # 0 |
max(arr) | min(int_arr) # -4 |
min(arr) | max(str_arr) # 'yd' |
sum(arr) | sum(int_arr) # 6 sum(str_arr) # TypeError |
Compare
Standard Functions
20
Call expression format | Example(s) |
len(arr) | len(str_arr) # 5�len(empty_arr) # 0 |
max(arr) | max(str_arr) # 'yd' |
min(arr) | min(int_arr) # -4 |
sum(arr) | sum(int_arr) # 6 sum(str_arr) # TypeError |
While the function names are identical to what we saw for int/float/strs, the call expressions evaluate differently with our new array data type.
Compare
NumPy
21
➤
1. Array Functions
2. NumPy
3. Indexing
4. Materials-Based Research
NumPy: A Convenient Function Library
Earlier, we computed averages using built-in Python functions:
Computing averages of array elements happens a lot in data science!
The NumPy package function np.average() is human-readable and convenient.
22
arr = make_array(30, -40, -4.5, 0, 35)
avg = sum(arr)/len(arr)
avg
In [2]:
4.1
Out [2]:
arr = make_array(30, -40, -4.5, 0, 35)
avg = np.average(arr)
avg
In [2]:
4.1
Out [2]:
The NumPy package
NumPy (pronounced “num pie”) is �a Python package* with convenient and�powerful functions for manipulating arrays.
23
*For our purposes, “library”, “package”, and “module” all mean similar things.
arr = make_array(30, -40, -4.5, 0, 35)
avg = np.average(arr)
avg
In [2]:
4.1
Out [2]:
import numpy as np
In [1]:
import numpy as np
Anytime we want to use NumPy, we run
We generally put this import statement at the top of our notebook,�then prepend np. to call a NumPy function.
Element-wise NumPy Functions
We’ll point you to NumPy functions as they come up; you don’t need to memorize them. The course website has a list of some of them.
24
NumPy functions
Many of these functions work on both arrays and individual numbers.
N-length array
N-length array
Demo
Common NumPy Functions
25
NumPy function | Return value |
np.average(arr) np.mean(arr) | The average (i.e., mean) value of arr |
np.sum(arr) | The sum of all elements in arr |
np.prod(arr) | The product of all elements in arr |
np.count_nonzero(arr) | The number of elements in arr that are not equal to 0 |
NumPy functions
N-length array
N-length array
Demo
Even More Functions
26
NumPy function | Return value |
np.diff(arr) | The difference between each element and the previous one value of arr (N-1 length) array |
np.cumsum(arr) | The cumulative sum of all elements in arr |
np.sqrt(arr) | The square root of all elements in arr |
NumPy functions
N-length array
N-length array
Demo
Questions About Functions?
27
Data 6 Python Reference™
Indexing
28
➤
1. Array Functions
2. NumPy
3. Indexing
4. Materials-Based Research
Array Methods
Methods are functions that we call with “dot” syntax. There are several array methods that make it easy to calculate values of interest.
Terminology note: Method calls are where the function operates directly on the array arr.
In these examples, method calls are equivalent to the NumPy package functions.
29
The most common array method is item(), which is used for array indexing.
An Element’s Index Is Its Position in an Array
When people stand in a line, each person has a position.
Similarly, each element (i.e., value) of an array has a position – called its index.
Python, like most programming languages, is 0-indexed. This means that in an array, the first element has index 0, not 1.
30
Person 1
Person 7
Index 0
Index 6
Indices
0 1 2 3 4
For a length-5 array:
Array Indexing
We can access an element in an array by using its index and the item() method:
arr.item(index)
31
Though int_arr has 5 elements, the largest valid index is 4.
Demo
Negative Indexing
We can also “count backwards” using negative indexes.
32
Demo
Questions?
33
In Conclusion…
34
Summary
35
Recap
Next Time
36
https://tinyurl.com/data6quickcheck