1 of 87

Optimizing Machine Learning Code in NumPy & SciPy

Before we start…

Join us Slido.com! #984636 (used for interactivity + questions)
Slideshow: bit.ly/optimize-numpy
Notebook: bit.ly/optimize-numpy-nb
Blog Post: nicholasvadivelu.com/2021/05/10/fast-k-means/
If you’re not a member, please sign up for updates (it’s free)!�bit.ly/dsc-w22-signup
Check out our discord community, YouTube channel + more�linktr.ee/uwdsc
This workshop will be recorded and uploaded to YouTube

2 of 87

Optimizing Machine Learning Code in NumPy & SciPy

Presented by Nicholas Vadivelu (nicholas.vadivelu@gmail.com, nicv#1748)

3 of 87

3

Workshop Overview

Motivation:

Need to implement numerical algos efficiently for Big Data
Too easy to write slow NumPy & SciPy code

Prereqs:

Beginner/Intermediate NumPy/SciPy user
Some ML/DS knowledge (K-means)

Goals:

Understand how to use broadcasting, vectorization, sparsity

4 of 87

4

How familiar are you with NumPy?

ⓘ Start presenting to display the poll results on this slide.

5 of 87

5

How familiar are you with K-means?

ⓘ Start presenting to display the poll results on this slide.

6 of 87

6

Workshop Outline

Why NumPy & SciPy?

High level principles of writing fast NumPy (& SciPy)�

K-Means “in 5 minutes”�
Implementation

Initial Implementation
Optimizing Assignment (Broadcasting + Vectorization)
Optimizing Update (Matmul + Sparsity)�

Wrap up

7 of 87

7

Why not Python?

Pure Python can be slow because…

Interpreted (not compiled)
Single threaded
Many indirections

8 of 87

8

Why NumPy (& SciPy)?

NumPy provides efficient multi-dimensional arrays�SciPy provides efficient numerical algorithms/data structures�
Unlike Python:

Compiled (fortran/C)
Multithreaded + can take advantage of multiple cores
Less memory used + memory used more efficiently
Highly optimized implementations
Standardized
etc…

9 of 87

9

Rules for fast NumPy (SciPy)

Eliminate Avoid for loops → use vectorization�
Understand the memory model�
Create a more efficient algorithm�
Fewer function calls → fewer allocations → faster code

10 of 87

10

Rules for fast NumPy (SciPy)

Eliminate Avoid for loops → use vectorization�
Understand the memory model�
Create a more efficient algorithm�
Fewer function calls → fewer allocations → faster code

BENCHMARK

11 of 87

11

A quick note on the memory model…

>> array

[10, 20, 30, 40, 50, 60, 70, 80]�

>> array.reshape((4, 2))

[[10, 20], [30, 40], [50, 60], [70, 80]]�

>> array.reshape((4, 2))[::2]

[[10, 20], [30, 40], [50, 60], [70, 80]]�

>> array.reshape((4, 2)).T

[[10, 20, 30, 40], [50, 60, 70, 80]]

12 of 87

12

Audience Q&A Session

ⓘ Start presenting to display the audience questions on this slide.

13 of 87

K-Means “in 5 minutes”

13

14 of 87

14

What is K-Means?

“Unsupervised clustering algorithm”�
Smartphone maker wants to decide what size phones to make

Survey customers for preferred size (height, width)
Want to decide on 3 sizes that’ll be the closest fit for the most people (i.e. minimize squared error on size)
Can use K-means on collected data!

15 of 87

15

How does it work?

Input: Dataset and the # of clusters (K)

Goal: Assign each point to one of the K clusters

Process:

Choose K initial points randomly (initial centroids)
Assign each datapoint to its closest centroid cluster
Update centroids via the average of the assigned points
Repeat 2 to 3 until until groups no longer change

16 of 87

K-Means Clustering: Demo

Random init centroids
Points assigned to closest centroids

Centroids of initial clusters calculated

16

K = 3, Iteration = 0

K = 3, Iteration = 1

17 of 87

K-Means Clustering: Demo

Centroids of initial clusters assigned

Reassign points
Compute centroids

17

K = 3, Iteration = 1

K = 3, Iteration = 2

18 of 87

K-Means Clustering: Demo

Reassign points
Compute centroids

Reassign points
Compute centroids

18

K = 3, Iteration = 2

K = 3, Iteration = 3

19 of 87

K-Means Clustering: Demo

19

K = 3, Iteration = 3

K = 3, Iteration = 4

K = 3, Iteration = 5

20 of 87

20

Audience Q&A Session

ⓘ Start presenting to display the audience questions on this slide.

21 of 87

21

Initial Implementation

def kmeans(data, k, num_iter=50):

n, d = data.shape

centroids = data[np.random.choice(n, k, replace=False)] # (k, d)

labels = np.zeros(n) # (n,)

for _ in range(num_iter):

# ASSIGNMENT STEP

for i, point in enumerate(data):

distances = [np.linalg.norm(point - c) for c in centroids]

labels[i] = np.argmin(distances) # idx of closest centroid

# UPDATE STEP (average within each group)

centroids = np.stack([data[labels==i].mean(axis=0)

for i in range(k)]) # (k, d)

return centroids

22 of 87

22

Setup Steps

def kmeans(data, k, num_iter=50):

n, d = data.shape

centroids = data[np.random.choice(n, k, replace=False)] # (k, d)

labels = np.zeros(n) # (n,)

for _ in range(num_iter):

# ASSIGNMENT STEP

for i, point in enumerate(data):

distances = [np.linalg.norm(point - c) for c in centroids]

labels[i] = np.argmin(distances) # idx of closest centroid

# UPDATE STEP (average within each group)

centroids = np.stack([data[labels==i].mean(axis=0)

for i in range(k)]) # (k, d)

return centroids

23 of 87

23

Fixed Iteration (instead of stop cond.)

def kmeans(data, k, num_iter=50):

n, d = data.shape

centroids = data[np.random.choice(n, k, replace=False)] # (k, d)

labels = np.zeros(n) # (n,)

for _ in range(num_iter):

# ASSIGNMENT STEP

for i, point in enumerate(data):

distances = [np.linalg.norm(point - c) for c in centroids]

labels[i] = np.argmin(distances) # idx of closest centroid

# UPDATE STEP (average within each group)

centroids = np.stack([data[labels==i].mean(axis=0)

for i in range(k)]) # (k, d)

return centroids

24 of 87

24

Assignment Step

def kmeans(data, k, num_iter=50):

n, d = data.shape

centroids = data[np.random.choice(n, k, replace=False)] # (k, d)

labels = np.zeros(n) # (n,)

for _ in range(num_iter):

# ASSIGNMENT STEP

for i, point in enumerate(data):

distances = [np.linalg.norm(point - c) for c in centroids]

labels[i] = np.argmin(distances) # idx of closest centroid

# UPDATE STEP (average within each group)

centroids = np.stack([data[labels==i].mean(axis=0)

for i in range(k)]) # (k, d)

return centroids

25 of 87

25

Update Step

def kmeans(data, k, num_iter=50):

n, d = data.shape

centroids = data[np.random.choice(n, k, replace=False)] # (k, d)

labels = np.zeros(n) # (n,)

for _ in range(num_iter):

# ASSIGNMENT STEP

for i, point in enumerate(data):

distances = [np.linalg.norm(point - c) for c in centroids]

labels[i] = np.argmin(distances) # idx of closest centroid

# UPDATE STEP (average within each group)

centroids = np.stack([data[labels==i].mean(axis=0)

for i in range(k)]) # (k, d)

return centroids

26 of 87

26

Return

def kmeans(data, k, num_iter=50):

n, d = data.shape

centroids = data[np.random.choice(n, k, replace=False)] # (k, d)

labels = np.zeros(n) # (n,)

for _ in range(num_iter):

# ASSIGNMENT STEP

for i, point in enumerate(data):

distances = [np.linalg.norm(point - c) for c in centroids]

labels[i] = np.argmin(distances) # idx of closest centroid

# UPDATE STEP (average within each group)

centroids = np.stack([data[labels==i].mean(axis=0)

for i in range(k)]) # (k, d)

return centroids

27 of 87

27

Refactored Implementation

def kmeans(data, k, num_iter=50):

n, d = data.shape

centroids = data[np.random.choice(n, k, replace=False)] # (k, d)

for _ in range(num_iter):

labels = assignment_step(data, centroids) # (n,)

centroids = update_step(data, labels, k) # (k, d)

return centroids��n, k, d = 5000, 26, 26

data = np.random.uniform(size=(n, d)) # dummy data

%timeit kmeans(data, k) # magic function from Jupyter Notebooks

>> 1 loop, best of 5: 51.7 s per loop

28 of 87

28

Try it out!

��Slideshow: bit.ly/optimize-numpy

Notebook: bit.ly/optimize-numpy-nb

29 of 87

29

Audience Q&A Session

ⓘ Start presenting to display the audience questions on this slide.

30 of 87

30

How fast were you able to make kmeans/assignment_step/update_step?

ⓘ Start presenting to display the poll results on this slide.

31 of 87

Optimizing the Assignment Step

31

32 of 87

32

Original Assignment Step

def assignment_step_v1(data, centroids):

labels = np.empty(data.shape[0]) # (n,)

for i, point in enumerate(data):

distances = [np.linalg.norm(point - c) for c in centroids]

labels[i] = np.argmin(distances)

return labels

centroids = data[np.random.choice(n, k, replace=False)] # (k, d)

%timeit assignment_step_v1(data, centroids)

>> 1 loop, best of 5: 1.01 s per loop

33 of 87

33

Original Assignment Step

def assignment_step_v1(data, centroids):

labels = np.empty(data.shape[0]) # (n,)

for i, point in enumerate(data):

distances = [np.linalg.norm(point - c) for c in centroids]

labels[i] = np.argmin(distances)

return labels

%timeit assignment_step_v1(data, centroids)

>> 1 loop, best of 5: 1.01 s per loop

34 of 87

34

Broadcasting Distance Computation

def assignment_step_v2(data, centroids):

labels = np.empty(data.shape[0]) # (n,)

for i, point in enumerate(data):

distances = np.linalg.norm(point - centroids, axis=1) # (k,)

labels[i] = np.argmin(distances)

return labels

%timeit assignment_step_v2(data, centroids)

>> 10 loops, best of 5: 72.6 ms per loop

35 of 87

35

What is “Broadcasting”?

How NumPy does operations with arrays of different shapes

36 of 87

36

What is “Broadcasting”?

How NumPy does operations with arrays of different shapes

37 of 87

37

Broadcasting Rules

e.g.

a: 4 x 3

b: 3

38 of 87

38

Broadcasting Rules

Right-align the shapes

e.g.

a: 4 x 3

b: 3

39 of 87

39

Broadcasting Rules

Right-align the shapes
Left-pad shorter shapes with 1s

e.g.

a: 4 x 3

b: 1 x 3

40 of 87

40

Broadcasting Rules

Right-align the shapes
Left-pad shorter shapes with 1s
Ensure each corresponding dim either matches OR has 1s

e.g.

a: 4 x 3

b: 1 x 3

41 of 87

41

Broadcasting Rules

Right-align the shapes
Left-pad shorter shapes with 1s
Ensure each corresponding dim either matches OR has 1s
“Stretch” dimensions of size 1 to match other arrays

e.g.

a: 4 x 3

b: 4 x 3

42 of 87

42

Broadcasting Fail Example

Right-align the shapes
Left-pad shorter shapes with 1s
Ensure each corresponding dim either matches OR has 1s

e.g.

a: 4 x 3

b: 1 x 4

43 of 87

43

Broadcasting Example 2

e.g.

a: 4 x 1

b: 3

44 of 87

44

Broadcasting Example 2

Right-align the shapes

e.g.

a: 4 x 1

b: 3

45 of 87

45

Broadcasting Example 2

Right-align the shapes
Left-pad shorter shapes with 1s

e.g.

a: 4 x 1

b: 1 x 3

46 of 87

46

Broadcasting Example 2

Right-align the shapes
Left-pad shorter shapes with 1s
Ensure each corresponding dim either matches OR has 1s

e.g.

a: 4 x 1

b: 1 x 3

47 of 87

47

Broadcasting Example 2

Right-align the shapes
Left-pad shorter shapes with 1s
Ensure each corresponding dim either matches OR has 1s
“Stretch” dimensions of size 1 to match other arrays

e.g.

a: 4 x 3

b: 4 x 3

48 of 87

48

Audience Q&A Session

ⓘ Start presenting to display the audience questions on this slide.

49 of 87

49

Broadcasting Distance Computation

def assignment_step_v1(data, centroids):

labels = np.empty(data.shape[0]) # (n,)

for i, point in enumerate(data):

distances = np.linalg.norm(point - centroids, axis=1) # (k,)

labels[i] = np.argmin(distances)

return labels

point : d → d → 1 x d → k x d

centroids: k x d → k x d → k x d → k x d

50 of 87

50

Broadcasting Distance Computation

def assignment_step_v1(data, centroids):

labels = np.empty(data.shape[0]) # (n,)

for i, point in enumerate(data):

distances = np.linalg.norm(point - centroids, axis=1) # (k,)

labels[i] = np.argmin(distances)

return labels

point - centroids: k x d

np.linalg.norm(..., axis=1) treats this as k different d dimensional arrays, returning k different norms

51 of 87

51

Assignment Step v2: Some Broadcasting

def assignment_step_v2(data, centroids):

labels = np.empty(data.shape[0]) # (n,)

for i, point in enumerate(data):

distances = np.linalg.norm(point - centroids, axis=1) # (k,)

labels[i] = np.argmin(distances)

return labels

%timeit assignment_step_v2(data, centroids)

>> 10 loops, best of 5: 72.6 ms per loop

52 of 87

52

Assignment Step v3

def assignment_step_v3(data, centroids):

diff = data[:, None] - centroids[None] # (n, k, d)

distances = np.linalg.norm(diff, axis=2) # (n, k)

labels = np.argmin(distances, axis=1) # (n,)

return labels

%timeit assignment_step_v3(data, centroids)

>> 100 loops, best of 5: 18.1 ms per loop

53 of 87

53

Assignment Step v3

def assignment_step_v3(data, centroids):

diff = data[:, None] - centroids[None] # (n, k, d)

distances = np.linalg.norm(diff, axis=2) # (n, k)

labels = np.argmin(distances, axis=1) # (n,)

return labels

%timeit assignment_step_v3(data, centroids)

>> 100 loops, best of 5: 18.1 ms per loop

Note:

data : n x d → data[:, None] : n x 1 x d

centroids: k x d → centroids[None]: 1 x k x d

54 of 87

54

Audience Q&A Session

ⓘ Start presenting to display the audience questions on this slide.

55 of 87

55

Assignment Step v3

def assignment_step_v3(data, centroids):

diff = data[:, None] - centroids[None] # (n, k, d)

distances = np.linalg.norm(diff, axis=2) # (n, k)

labels = np.argmin(distances, axis=1) # (n,)

return labels

%timeit assignment_step_v3(data, centroids)

>> 100 loops, best of 5: 18.1 ms per loop

56 of 87

56

Assignment Step v3 - Improving Norm

def assignment_step_v3(data, centroids):

diff = data[:, None] - centroids[None] # (n, k, d)

distances = np.linalg.norm(diff, axis=2) # (n, k)

labels = np.argmin(distances, axis=1) # (n,)

return labels

57 of 87

57

Assignment Step v3 - Improving Norm

def assignment_step_v3(data, centroids):

diff = data[:, None] - centroids[None] # (n, k, d)

distances = np.sqrt((diff**2).sum(axis=2)) # (n, k)

labels = np.argmin(distances, axis=1) # (n,)

return labels

Do we need that square root?

58 of 87

58

Assignment Step v4 - Improving Norm

def assignment_step_v4(data, centroids):

diff = data[:, None] - centroids[None] # (n, k, d)

distances = (diff**2).sum(axis=2) # (n, k)

labels = np.argmin(distances, axis=1) # (n,)

return labels

Do we need that square root? No, it is monotonic so doesn’t affect the argmin.

>> 100 loops, best of 5: 17.8 ms per loop

59 of 87

59

Assignment Step v5 - Einsum

def assignment_step_v4(data, centroids):

diff = data[:, None] - centroids[None] # (n, k, d)

distances = np.einsum('nkd,nkd->nk', diff, diff) # (n, k)

labels = np.argmin(distances, axis=1) # (n,)

return labels

>> 100 loops, best of 5: 9.94 ms per loop

60 of 87

60

What is Einsum?

A compact way to express products and sums of arrays�
Advantages

Avoid creating intermediate arrays
Improve readability of complex operations
Automatically optimize order of operations�

Out of scope for this workshop 😬

61 of 87

61

Audience Q&A Session

ⓘ Start presenting to display the audience questions on this slide.

62 of 87

Optimizing the Update Step

62

63 of 87

63

Original Update Step

def update_step_original(data, labels, k):

return np.stack([data[labels==i].mean(axis=0) for i in range(k)])

labels = np.random.randint(0, k, size=n) # dummy labels

%timeit update_step_original(data, labels, k)

>> 1000 loops, best of 5: 1.15 ms per loop

64 of 87

64

Original Update Step - Time Complexity

def update_step_v1(data, labels, k): # returns centroids

return np.stack([data[labels==i].mean(axis=0) for i in range(k)])

labels = np.random.randint(0, k, size=n) # dummy labels

%timeit update_step_v1(data, labels, k)

>> 1000 loops, best of 5: 1.15 ms per loop

Time Complexity:

labels==i → O(n)

data[labels==i] → O(nd)

data[labels==i].mean(axis=0) → O(nd)

[data[labels==i].mean(axis=0) for i in range(k)] → O(ndk)

65 of 87

65

One-hot matrix

labels contain the centroid index for each datapoint
Want matrix with a row per datapoint that is “one-hot”

labels

>> [1, 0, 2, 2, 1]

np.eye(k)

>> [[1. 0. 0.]

[0. 1. 0.]

[0. 0. 1.]]

66 of 87

66

One-hot matrix

labels contain the centroid index for each datapoint
Want matrix with a row per datapoint that is “one-hot”

labels

>> [1, 0, 2, 2, 1]

np.eye(k)

>> [[1. 0. 0.]

[0. 1. 0.]

[0. 0. 1.]]

np.eye(k)[labels]

>> [[0. 1. 0.]

[1. 0. 0.]

[0. 0. 1.]

[0. 1. 0.]]

67 of 87

67

One-hot matrix

labels contain the centroid index for each datapoint
Want matrix with a row per datapoint that is “one-hot”

labels

>> [1, 0, 2, 2, 1]

np.eye(k)

>> [[1. 0. 0.]

[0. 1. 0.]

[0. 0. 1.]]

np.eye(k)[labels] # onehot

>> [[0. 1. 0.]

[1. 0. 0.]

[0. 0. 1.]

[0. 1. 0.]]

68 of 87

68

One-hot matrix

labels contain the centroid index for each datapoint
Want matrix with a row per datapoint that is “one-hot”

labels

>> [1, 0, 2, 2, 1]

np.eye(k)

>> [[1. 0. 0.]

[0. 1. 0.]

[0. 0. 1.]]

np.eye(k)[labels] # onehot

>> [[0. 1. 0.]

[1. 0. 0.]

[0. 0. 1.]

[0. 1. 0.]]

69 of 87

69

One-hot matrix

labels contain the centroid index for each datapoint
Want matrix with a row per datapoint that is “one-hot”

labels

>> [1, 0, 2, 2, 1]

np.eye(k)

>> [[1. 0. 0.]

[0. 1. 0.]

[0. 0. 1.]]

np.eye(k)[labels] # onehot

>> [[0. 1. 0.]

[1. 0. 0.]

[0. 0. 1.]

[0. 1. 0.]]

70 of 87

70

One-hot matrix

labels contain the centroid index for each datapoint
Want matrix with a row per datapoint that is “one-hot”

labels

>> [1, 0, 2, 2, 1]

np.eye(k)

>> [[1. 0. 0.]

[0. 1. 0.]

[0. 0. 1.]]

np.eye(k)[labels] # onehot

>> [[0. 1. 0.]

[1. 0. 0.]

[0. 0. 1.]

[0. 1. 0.]]

71 of 87

71

One-hot matrix

labels contain the centroid index for each datapoint
Want matrix with a row per datapoint that is “one-hot”

labels

>> [1, 0, 2, 2, 1]

np.eye(k)

>> [[1. 0. 0.]

[0. 1. 0.]

[0. 0. 1.]]

np.eye(k)[labels] # onehot

>> [[0. 1. 0.]

[1. 0. 0.]

[0. 0. 1.]

[0. 1. 0.]]

72 of 87

72

Sum by Centroid Group (n=5, d=2, k=3)

onehot.T @ data = sum_by_cluster

[[8. 0.]

[[0. 1. 0. 0. 0.] [8. 6.] [[ 8. 6.]

[1. 0. 0. 0. 1.] @ [1. 7.] = [17. 3.]

[0. 0. 1. 1. 0.]] [5. 8.] [ 6. 15.]]

[9. 3.]]

Recall: labels = [1, 0, 2, 2, 1]

73 of 87

73

Sum by Centroid Group (n=5, d=2, k=3)

onehot.T @ data = sum_by_cluster

[[8. 0.]

[[0. 1. 0. 0. 0.] [8. 6.] [[ 8. 6.]

[1. 0. 0. 0. 1.] @ [1. 7.] = [17. 3.]

[0. 0. 1. 1. 0.]] [5. 8.] [ 6. 15.]]

[9. 3.]]

Recall: labels = [1, 0, 2, 2, 1]

74 of 87

74

Sum by Centroid Group (n=5, d=2, k=3)

onehot.T @ data = sum_by_cluster

[[8. 0.]

[[0. 1. 0. 0. 0.] [8. 6.] [[ 8. 6.]

[1. 0. 0. 0. 1.] @ [1. 7.] = [17. 3.]

[0. 0. 1. 1. 0.]] [5. 8.] [ 6. 15.]]

[9. 3.]]

Recall: labels = [1, 0, 2, 2, 1]

75 of 87

75

Sum by Centroid Group (n=5, d=2, k=3)

onehot.T @ data = sum_by_cluster

[[8. 0.]

[[0. 1. 0. 0. 0.] [8. 6.] [[ 8. 6.]

[1. 0. 0. 0. 1.] @ [1. 7.] = [17. 3.]

[0. 0. 1. 1. 0.]] [5. 8.] [ 6. 15.]]

[9. 3.]]

Recall: labels = [1, 0, 2, 2, 1]

76 of 87

76

Sum by Centroid Group (n=5, d=2, k=3)

onehot.T @ data = sum_by_cluster

[[8. 0.]

[[0. 1. 0. 0. 0.] [8. 6.] [[ 8. 6.]

[1. 0. 0. 0. 1.] @ [1. 7.] = [17. 3.]

[0. 0. 1. 1. 0.]] [5. 8.] [ 6. 15.]]

[9. 3.]]

Recall: labels = [1, 0, 2, 2, 1]

77 of 87

77

Number of Points per Cluster

onehot.sum(axis=0) → [1., 2., 2.]

O(nk), since we sum up n elements for each of k centroids�

np.bincount(labels, minlength=k) → [1., 2., 2.]

O(n + k), we allocate the length k array once, then iterate through labels once

78 of 87

78

Putting it all together…

def update_step_matmul(data, labels, k): # returns centroids

onehot = np.eye(k)[labels] # (n, k)

cluster_sums = onehot.T @ data # (k, d)

cluster_counts = np.bincount(labels, minlength=k)[:, None] # (k, 1)

return cluster_sums / cluster_counts # (k, d)

%timeit update_step_matmul(data, labels, k)

>> 1000 loops, best of 5: 1.03 ms per loop

79 of 87

79

Putting it all together…

def update_step_matmul(data, labels, k): # returns centroids

onehot = np.eye(k)[labels] # (n, k)

cluster_sums = onehot.T @ data # (k, d)

cluster_counts = np.bincount(labels, minlength=k)[:, None] # (k, 1)

return cluster_sums / cluster_counts # (k, d)

%timeit update_step_matmul(data, labels, k)

>> 1000 loops, best of 5: 1.03 ms per loop

Tiny improvement…😔

Time complexity still O(ndk) due to the matmul

80 of 87

80

Audience Q&A Session

ⓘ Start presenting to display the audience questions on this slide.

81 of 87

81

What’s now?

onehot only has n non-zero values

but our matmul uses all n*k values�

How can we multiply only non-zero values? Sparse matrices!

82 of 87

82

Sparse Matrices

Sparse matrices (arrays) store non-zero values and the positions those non-zero values belong in the full matrix (array)�
Many in SciPy optimized for diff. tasks, we care about speed:

Compressed Sparse Row (CSR)
Compressed Sparse Row (CSC)�

CSR more space efficient w/ fewer rows; CSC fewer columns

Hard to tell which will be faster for a task--BENCHMARK!

83 of 87

83

Sparse

from scipy import sparse

def update_step_csc(data, labels, k):

n, d = data.shape

# constructor: (values, (row_indices, column_indices))

onehotT = sparse.csc_matrix(

(np.ones(n), (labels, np.arange(n))), shape=(k, n))

group_counts = np.bincount(labels, minlength=k)[:, None] # (k, 1)

return onehotT @ data / group_counts

%timeit update_step_csc(data, labels, k)

>> 1000 loops, best of 5: 491 µs per loop

Time complexity: O(d(n+k))

84 of 87

84

Review

Broadcasting can help us vectorize loops in NumPy�
Using mathematical tricks can either reduce work or help vectorize�
Improving the asymptotic performance has the biggest impact, and utilities like Sparse matrices can help with that

85 of 87

85

Audience Q&A Session

ⓘ Start presenting to display the audience questions on this slide.

86 of 87

86

Resources for Further Learning

Advanced NumPy: https://www.youtube.com/watch?v=cYugp9IN1-Q�
https://nicholasvadivelu.com/2021/05/10/fast-k-means/�
Stride tricks:�https://numpy.org/doc/stable/reference/generated/numpy.lib.stride_tricks.as_strided.html

87 of 87

Thanks for coming!��Fill out the feedback form to be entered in a raffle + get food! bit.ly/w22-raffle-feedback-form

87