Optimizing Machine Learning Code in NumPy & SciPy
Before we start…
Optimizing Machine Learning Code in NumPy & SciPy
Presented by Nicholas Vadivelu (nicholas.vadivelu@gmail.com, nicv#1748)
3
Workshop Overview
Motivation:
Prereqs:
Goals:
4
How familiar are you with NumPy?
ⓘ Start presenting to display the poll results on this slide.
5
How familiar are you with K-means?
ⓘ Start presenting to display the poll results on this slide.
6
Workshop Outline
7
Why not Python?
8
Why NumPy (& SciPy)?
9
Rules for fast NumPy (SciPy)
10
Rules for fast NumPy (SciPy)
BENCHMARK
11
A quick note on the memory model…
>> array
[10, 20, 30, 40, 50, 60, 70, 80]�
>> array.reshape((4, 2))
[[10, 20], [30, 40], [50, 60], [70, 80]]�
>> array.reshape((4, 2))[::2]
[[10, 20], [30, 40], [50, 60], [70, 80]]�
>> array.reshape((4, 2)).T
[[10, 20, 30, 40], [50, 60, 70, 80]]
12
Audience Q&A Session
ⓘ Start presenting to display the audience questions on this slide.
K-Means “in 5 minutes”
13
14
What is K-Means?
15
How does it work?
Input: Dataset and the # of clusters (K)
Goal: Assign each point to one of the K clusters
Process:
K-Means Clustering: Demo
16
K = 3, Iteration = 0
K = 3, Iteration = 1
K-Means Clustering: Demo
17
K = 3, Iteration = 1
K = 3, Iteration = 2
K-Means Clustering: Demo
18
K = 3, Iteration = 2
K = 3, Iteration = 3
K-Means Clustering: Demo
19
K = 3, Iteration = 3
K = 3, Iteration = 4
K = 3, Iteration = 5
20
Audience Q&A Session
ⓘ Start presenting to display the audience questions on this slide.
21
Initial Implementation
def kmeans(data, k, num_iter=50):
n, d = data.shape
centroids = data[np.random.choice(n, k, replace=False)] # (k, d)
labels = np.zeros(n) # (n,)
for _ in range(num_iter):
# ASSIGNMENT STEP
for i, point in enumerate(data):
distances = [np.linalg.norm(point - c) for c in centroids]
labels[i] = np.argmin(distances) # idx of closest centroid
# UPDATE STEP (average within each group)
centroids = np.stack([data[labels==i].mean(axis=0)
for i in range(k)]) # (k, d)
return centroids
22
Setup Steps
def kmeans(data, k, num_iter=50):
n, d = data.shape
centroids = data[np.random.choice(n, k, replace=False)] # (k, d)
labels = np.zeros(n) # (n,)
for _ in range(num_iter):
# ASSIGNMENT STEP
for i, point in enumerate(data):
distances = [np.linalg.norm(point - c) for c in centroids]
labels[i] = np.argmin(distances) # idx of closest centroid
# UPDATE STEP (average within each group)
centroids = np.stack([data[labels==i].mean(axis=0)
for i in range(k)]) # (k, d)
return centroids
23
Fixed Iteration (instead of stop cond.)
def kmeans(data, k, num_iter=50):
n, d = data.shape
centroids = data[np.random.choice(n, k, replace=False)] # (k, d)
labels = np.zeros(n) # (n,)
for _ in range(num_iter):
# ASSIGNMENT STEP
for i, point in enumerate(data):
distances = [np.linalg.norm(point - c) for c in centroids]
labels[i] = np.argmin(distances) # idx of closest centroid
# UPDATE STEP (average within each group)
centroids = np.stack([data[labels==i].mean(axis=0)
for i in range(k)]) # (k, d)
return centroids
24
Assignment Step
def kmeans(data, k, num_iter=50):
n, d = data.shape
centroids = data[np.random.choice(n, k, replace=False)] # (k, d)
labels = np.zeros(n) # (n,)
for _ in range(num_iter):
# ASSIGNMENT STEP
for i, point in enumerate(data):
distances = [np.linalg.norm(point - c) for c in centroids]
labels[i] = np.argmin(distances) # idx of closest centroid
# UPDATE STEP (average within each group)
centroids = np.stack([data[labels==i].mean(axis=0)
for i in range(k)]) # (k, d)
return centroids
25
Update Step
def kmeans(data, k, num_iter=50):
n, d = data.shape
centroids = data[np.random.choice(n, k, replace=False)] # (k, d)
labels = np.zeros(n) # (n,)
for _ in range(num_iter):
# ASSIGNMENT STEP
for i, point in enumerate(data):
distances = [np.linalg.norm(point - c) for c in centroids]
labels[i] = np.argmin(distances) # idx of closest centroid
# UPDATE STEP (average within each group)
centroids = np.stack([data[labels==i].mean(axis=0)
for i in range(k)]) # (k, d)
return centroids
26
Return
def kmeans(data, k, num_iter=50):
n, d = data.shape
centroids = data[np.random.choice(n, k, replace=False)] # (k, d)
labels = np.zeros(n) # (n,)
for _ in range(num_iter):
# ASSIGNMENT STEP
for i, point in enumerate(data):
distances = [np.linalg.norm(point - c) for c in centroids]
labels[i] = np.argmin(distances) # idx of closest centroid
# UPDATE STEP (average within each group)
centroids = np.stack([data[labels==i].mean(axis=0)
for i in range(k)]) # (k, d)
return centroids
27
Refactored Implementation
def kmeans(data, k, num_iter=50):
n, d = data.shape
centroids = data[np.random.choice(n, k, replace=False)] # (k, d)
for _ in range(num_iter):
labels = assignment_step(data, centroids) # (n,)
centroids = update_step(data, labels, k) # (k, d)
return centroids��n, k, d = 5000, 26, 26
data = np.random.uniform(size=(n, d)) # dummy data
%timeit kmeans(data, k) # magic function from Jupyter Notebooks
>> 1 loop, best of 5: 51.7 s per loop
28
Try it out!
29
Audience Q&A Session
ⓘ Start presenting to display the audience questions on this slide.
30
How fast were you able to make kmeans/assignment_step/update_step?
ⓘ Start presenting to display the poll results on this slide.
Optimizing the Assignment Step
31
32
Original Assignment Step
def assignment_step_v1(data, centroids):
labels = np.empty(data.shape[0]) # (n,)
for i, point in enumerate(data):
distances = [np.linalg.norm(point - c) for c in centroids]
labels[i] = np.argmin(distances)
return labels
centroids = data[np.random.choice(n, k, replace=False)] # (k, d)
%timeit assignment_step_v1(data, centroids)
>> 1 loop, best of 5: 1.01 s per loop
33
Original Assignment Step
def assignment_step_v1(data, centroids):
labels = np.empty(data.shape[0]) # (n,)
for i, point in enumerate(data):
distances = [np.linalg.norm(point - c) for c in centroids]
labels[i] = np.argmin(distances)
return labels
%timeit assignment_step_v1(data, centroids)
>> 1 loop, best of 5: 1.01 s per loop
34
Broadcasting Distance Computation
def assignment_step_v2(data, centroids):
labels = np.empty(data.shape[0]) # (n,)
for i, point in enumerate(data):
distances = np.linalg.norm(point - centroids, axis=1) # (k,)
labels[i] = np.argmin(distances)
return labels
%timeit assignment_step_v2(data, centroids)
>> 10 loops, best of 5: 72.6 ms per loop
35
What is “Broadcasting”?
36
What is “Broadcasting”?
37
Broadcasting Rules
e.g.
a: 4 x 3
b: 3
38
Broadcasting Rules
e.g.
a: 4 x 3
b: 3
39
Broadcasting Rules
e.g.
a: 4 x 3
b: 1 x 3
40
Broadcasting Rules
e.g.
a: 4 x 3
b: 1 x 3
41
Broadcasting Rules
e.g.
a: 4 x 3
b: 4 x 3
42
Broadcasting Fail Example
e.g.
a: 4 x 3
b: 1 x 4
43
Broadcasting Example 2
e.g.
a: 4 x 1
b: 3
44
Broadcasting Example 2
e.g.
a: 4 x 1
b: 3
45
Broadcasting Example 2
e.g.
a: 4 x 1
b: 1 x 3
46
Broadcasting Example 2
e.g.
a: 4 x 1
b: 1 x 3
47
Broadcasting Example 2
e.g.
a: 4 x 3
b: 4 x 3
48
Audience Q&A Session
ⓘ Start presenting to display the audience questions on this slide.
49
Broadcasting Distance Computation
def assignment_step_v1(data, centroids):
labels = np.empty(data.shape[0]) # (n,)
for i, point in enumerate(data):
distances = np.linalg.norm(point - centroids, axis=1) # (k,)
labels[i] = np.argmin(distances)
return labels
point : d → d → 1 x d → k x d
centroids: k x d → k x d → k x d → k x d
50
Broadcasting Distance Computation
def assignment_step_v1(data, centroids):
labels = np.empty(data.shape[0]) # (n,)
for i, point in enumerate(data):
distances = np.linalg.norm(point - centroids, axis=1) # (k,)
labels[i] = np.argmin(distances)
return labels
point - centroids: k x d
51
Assignment Step v2: Some Broadcasting
def assignment_step_v2(data, centroids):
labels = np.empty(data.shape[0]) # (n,)
for i, point in enumerate(data):
distances = np.linalg.norm(point - centroids, axis=1) # (k,)
labels[i] = np.argmin(distances)
return labels
%timeit assignment_step_v2(data, centroids)
>> 10 loops, best of 5: 72.6 ms per loop
52
Assignment Step v3
def assignment_step_v3(data, centroids):
diff = data[:, None] - centroids[None] # (n, k, d)
distances = np.linalg.norm(diff, axis=2) # (n, k)
labels = np.argmin(distances, axis=1) # (n,)
return labels
%timeit assignment_step_v3(data, centroids)
>> 100 loops, best of 5: 18.1 ms per loop
53
Assignment Step v3
def assignment_step_v3(data, centroids):
diff = data[:, None] - centroids[None] # (n, k, d)
distances = np.linalg.norm(diff, axis=2) # (n, k)
labels = np.argmin(distances, axis=1) # (n,)
return labels
%timeit assignment_step_v3(data, centroids)
>> 100 loops, best of 5: 18.1 ms per loop
Note:
data : n x d → data[:, None] : n x 1 x d
centroids: k x d → centroids[None]: 1 x k x d
54
Audience Q&A Session
ⓘ Start presenting to display the audience questions on this slide.
55
Assignment Step v3
def assignment_step_v3(data, centroids):
diff = data[:, None] - centroids[None] # (n, k, d)
distances = np.linalg.norm(diff, axis=2) # (n, k)
labels = np.argmin(distances, axis=1) # (n,)
return labels
%timeit assignment_step_v3(data, centroids)
>> 100 loops, best of 5: 18.1 ms per loop
56
Assignment Step v3 - Improving Norm
def assignment_step_v3(data, centroids):
diff = data[:, None] - centroids[None] # (n, k, d)
distances = np.linalg.norm(diff, axis=2) # (n, k)
labels = np.argmin(distances, axis=1) # (n,)
return labels
57
Assignment Step v3 - Improving Norm
def assignment_step_v3(data, centroids):
diff = data[:, None] - centroids[None] # (n, k, d)
distances = np.sqrt((diff**2).sum(axis=2)) # (n, k)
labels = np.argmin(distances, axis=1) # (n,)
return labels
Do we need that square root?
58
Assignment Step v4 - Improving Norm
def assignment_step_v4(data, centroids):
diff = data[:, None] - centroids[None] # (n, k, d)
distances = (diff**2).sum(axis=2) # (n, k)
labels = np.argmin(distances, axis=1) # (n,)
return labels
Do we need that square root? No, it is monotonic so doesn’t affect the argmin.
>> 100 loops, best of 5: 17.8 ms per loop
59
Assignment Step v5 - Einsum
def assignment_step_v4(data, centroids):
diff = data[:, None] - centroids[None] # (n, k, d)
distances = np.einsum('nkd,nkd->nk', diff, diff) # (n, k)
labels = np.argmin(distances, axis=1) # (n,)
return labels
>> 100 loops, best of 5: 9.94 ms per loop
60
What is Einsum?
61
Audience Q&A Session
ⓘ Start presenting to display the audience questions on this slide.
Optimizing the Update Step
62
63
Original Update Step
def update_step_original(data, labels, k):
return np.stack([data[labels==i].mean(axis=0) for i in range(k)])
labels = np.random.randint(0, k, size=n) # dummy labels
%timeit update_step_original(data, labels, k)
>> 1000 loops, best of 5: 1.15 ms per loop
64
Original Update Step - Time Complexity
def update_step_v1(data, labels, k): # returns centroids
return np.stack([data[labels==i].mean(axis=0) for i in range(k)])
labels = np.random.randint(0, k, size=n) # dummy labels
%timeit update_step_v1(data, labels, k)
>> 1000 loops, best of 5: 1.15 ms per loop
Time Complexity:
labels==i → O(n)
data[labels==i] → O(nd)
data[labels==i].mean(axis=0) → O(nd)
[data[labels==i].mean(axis=0) for i in range(k)] → O(ndk)
65
One-hot matrix
labels
>> [1, 0, 2, 2, 1]
np.eye(k)
>> [[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
66
One-hot matrix
labels
>> [1, 0, 2, 2, 1]
np.eye(k)
>> [[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
np.eye(k)[labels]
>> [[0. 1. 0.]
[1. 0. 0.]
[0. 0. 1.]
[0. 0. 1.]
[0. 1. 0.]]
67
One-hot matrix
labels
>> [1, 0, 2, 2, 1]
np.eye(k)
>> [[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
np.eye(k)[labels] # onehot
>> [[0. 1. 0.]
[1. 0. 0.]
[0. 0. 1.]
[0. 0. 1.]
[0. 1. 0.]]
68
One-hot matrix
labels
>> [1, 0, 2, 2, 1]
np.eye(k)
>> [[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
np.eye(k)[labels] # onehot
>> [[0. 1. 0.]
[1. 0. 0.]
[0. 0. 1.]
[0. 0. 1.]
[0. 1. 0.]]
69
One-hot matrix
labels
>> [1, 0, 2, 2, 1]
np.eye(k)
>> [[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
np.eye(k)[labels] # onehot
>> [[0. 1. 0.]
[1. 0. 0.]
[0. 0. 1.]
[0. 0. 1.]
[0. 1. 0.]]
70
One-hot matrix
labels
>> [1, 0, 2, 2, 1]
np.eye(k)
>> [[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
np.eye(k)[labels] # onehot
>> [[0. 1. 0.]
[1. 0. 0.]
[0. 0. 1.]
[0. 0. 1.]
[0. 1. 0.]]
71
One-hot matrix
labels
>> [1, 0, 2, 2, 1]
np.eye(k)
>> [[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
np.eye(k)[labels] # onehot
>> [[0. 1. 0.]
[1. 0. 0.]
[0. 0. 1.]
[0. 0. 1.]
[0. 1. 0.]]
72
Sum by Centroid Group (n=5, d=2, k=3)
onehot.T @ data = sum_by_cluster
[[8. 0.]
[[0. 1. 0. 0. 0.] [8. 6.] [[ 8. 6.]
[1. 0. 0. 0. 1.] @ [1. 7.] = [17. 3.]
[0. 0. 1. 1. 0.]] [5. 8.] [ 6. 15.]]
[9. 3.]]
Recall: labels = [1, 0, 2, 2, 1]
73
Sum by Centroid Group (n=5, d=2, k=3)
onehot.T @ data = sum_by_cluster
[[8. 0.]
[[0. 1. 0. 0. 0.] [8. 6.] [[ 8. 6.]
[1. 0. 0. 0. 1.] @ [1. 7.] = [17. 3.]
[0. 0. 1. 1. 0.]] [5. 8.] [ 6. 15.]]
[9. 3.]]
Recall: labels = [1, 0, 2, 2, 1]
74
Sum by Centroid Group (n=5, d=2, k=3)
onehot.T @ data = sum_by_cluster
[[8. 0.]
[[0. 1. 0. 0. 0.] [8. 6.] [[ 8. 6.]
[1. 0. 0. 0. 1.] @ [1. 7.] = [17. 3.]
[0. 0. 1. 1. 0.]] [5. 8.] [ 6. 15.]]
[9. 3.]]
Recall: labels = [1, 0, 2, 2, 1]
75
Sum by Centroid Group (n=5, d=2, k=3)
onehot.T @ data = sum_by_cluster
[[8. 0.]
[[0. 1. 0. 0. 0.] [8. 6.] [[ 8. 6.]
[1. 0. 0. 0. 1.] @ [1. 7.] = [17. 3.]
[0. 0. 1. 1. 0.]] [5. 8.] [ 6. 15.]]
[9. 3.]]
Recall: labels = [1, 0, 2, 2, 1]
76
Sum by Centroid Group (n=5, d=2, k=3)
onehot.T @ data = sum_by_cluster
[[8. 0.]
[[0. 1. 0. 0. 0.] [8. 6.] [[ 8. 6.]
[1. 0. 0. 0. 1.] @ [1. 7.] = [17. 3.]
[0. 0. 1. 1. 0.]] [5. 8.] [ 6. 15.]]
[9. 3.]]
Recall: labels = [1, 0, 2, 2, 1]
77
Number of Points per Cluster
78
Putting it all together…
def update_step_matmul(data, labels, k): # returns centroids
onehot = np.eye(k)[labels] # (n, k)
cluster_sums = onehot.T @ data # (k, d)
cluster_counts = np.bincount(labels, minlength=k)[:, None] # (k, 1)
return cluster_sums / cluster_counts # (k, d)
%timeit update_step_matmul(data, labels, k)
>> 1000 loops, best of 5: 1.03 ms per loop
79
Putting it all together…
def update_step_matmul(data, labels, k): # returns centroids
onehot = np.eye(k)[labels] # (n, k)
cluster_sums = onehot.T @ data # (k, d)
cluster_counts = np.bincount(labels, minlength=k)[:, None] # (k, 1)
return cluster_sums / cluster_counts # (k, d)
%timeit update_step_matmul(data, labels, k)
>> 1000 loops, best of 5: 1.03 ms per loop
Tiny improvement…😔
Time complexity still O(ndk) due to the matmul
80
Audience Q&A Session
ⓘ Start presenting to display the audience questions on this slide.
81
What’s now?
82
Sparse Matrices
83
Sparse
from scipy import sparse
def update_step_csc(data, labels, k):
n, d = data.shape
# constructor: (values, (row_indices, column_indices))
onehotT = sparse.csc_matrix(
(np.ones(n), (labels, np.arange(n))), shape=(k, n))
group_counts = np.bincount(labels, minlength=k)[:, None] # (k, 1)
return onehotT @ data / group_counts
%timeit update_step_csc(data, labels, k)
>> 1000 loops, best of 5: 491 µs per loop
Time complexity: O(d(n+k))
84
Review
85
Audience Q&A Session
ⓘ Start presenting to display the audience questions on this slide.
86
Resources for Further Learning
Thanks for coming!��Fill out the feedback form to be entered in a raffle + get food! bit.ly/w22-raffle-feedback-form
87