1 of 87

Optimizing Machine Learning Code in NumPy & SciPy

Before we start…

  1. Join us Slido.com! #984636 (used for interactivity + questions)
  2. Slideshow: bit.ly/optimize-numpy
  3. Notebook: bit.ly/optimize-numpy-nb
  4. Blog Post: nicholasvadivelu.com/2021/05/10/fast-k-means/
  5. If you’re not a member, please sign up for updates (it’s free)!�bit.ly/dsc-w22-signup
  6. Check out our discord community, YouTube channel + more�linktr.ee/uwdsc
  7. This workshop will be recorded and uploaded to YouTube

2 of 87

Optimizing Machine Learning Code in NumPy & SciPy

Presented by Nicholas Vadivelu (nicholas.vadivelu@gmail.com, nicv#1748)

3 of 87

3

Workshop Overview

Motivation:

  • Need to implement numerical algos efficiently for Big Data
  • Too easy to write slow NumPy & SciPy code

Prereqs:

  • Beginner/Intermediate NumPy/SciPy user
  • Some ML/DS knowledge (K-means)

Goals:

  • Understand how to use broadcasting, vectorization, sparsity

4 of 87

4

How familiar are you with NumPy?

Start presenting to display the poll results on this slide.

5 of 87

5

How familiar are you with K-means?

Start presenting to display the poll results on this slide.

6 of 87

6

Workshop Outline

  1. Why NumPy & SciPy?
    1. High level principles of writing fast NumPy (& SciPy)�
  2. K-Means “in 5 minutes”�
  3. Implementation
    • Initial Implementation
    • Optimizing Assignment (Broadcasting + Vectorization)
    • Optimizing Update (Matmul + Sparsity)�
  4. Wrap up

7 of 87

7

Why not Python?

  • Pure Python can be slow because…
    • Interpreted (not compiled)
    • Single threaded
    • Many indirections

8 of 87

8

Why NumPy (& SciPy)?

  • NumPy provides efficient multi-dimensional arrays�SciPy provides efficient numerical algorithms/data structures�
  • Unlike Python:
    • Compiled (fortran/C)
    • Multithreaded + can take advantage of multiple cores
    • Less memory used + memory used more efficiently
    • Highly optimized implementations
    • Standardized
    • etc…

9 of 87

9

Rules for fast NumPy (SciPy)

  1. Eliminate Avoid for loops → use vectorization�
  2. Understand the memory model�
  3. Create a more efficient algorithm�
  4. Fewer function calls → fewer allocations → faster code

10 of 87

10

Rules for fast NumPy (SciPy)

  • Eliminate Avoid for loops → use vectorization�
  • Understand the memory model�
  • Create a more efficient algorithm�
  • Fewer function calls → fewer allocations → faster code

BENCHMARK

11 of 87

11

A quick note on the memory model…

>> array

[10, 20, 30, 40, 50, 60, 70, 80]�

>> array.reshape((4, 2))

[[10, 20], [30, 40], [50, 60], [70, 80]]�

>> array.reshape((4, 2))[::2]

[[10, 20], [30, 40], [50, 60], [70, 80]]�

>> array.reshape((4, 2)).T

[[10, 20, 30, 40], [50, 60, 70, 80]]

12 of 87

12

Audience Q&A Session

Start presenting to display the audience questions on this slide.

13 of 87

K-Means “in 5 minutes”

13

14 of 87

14

What is K-Means?

  • “Unsupervised clustering algorithm”�
  • Smartphone maker wants to decide what size phones to make
    • Survey customers for preferred size (height, width)
    • Want to decide on 3 sizes that’ll be the closest fit for the most people (i.e. minimize squared error on size)
    • Can use K-means on collected data!

15 of 87

15

How does it work?

Input: Dataset and the # of clusters (K)

Goal: Assign each point to one of the K clusters

Process:

  1. Choose K initial points randomly (initial centroids)
  2. Assign each datapoint to its closest centroid cluster
  3. Update centroids via the average of the assigned points
  4. Repeat 2 to 3 until until groups no longer change

16 of 87

K-Means Clustering: Demo

  • Random init centroids
  • Points assigned to closest centroids
  • Centroids of initial clusters calculated

16

K = 3, Iteration = 0

K = 3, Iteration = 1

17 of 87

K-Means Clustering: Demo

  • Centroids of initial clusters assigned
  • Reassign points
  • Compute centroids

17

K = 3, Iteration = 1

K = 3, Iteration = 2

18 of 87

K-Means Clustering: Demo

  • Reassign points
  • Compute centroids
  • Reassign points
  • Compute centroids

18

K = 3, Iteration = 2

K = 3, Iteration = 3

19 of 87

K-Means Clustering: Demo

19

K = 3, Iteration = 3

K = 3, Iteration = 4

K = 3, Iteration = 5

20 of 87

20

Audience Q&A Session

Start presenting to display the audience questions on this slide.

21 of 87

21

Initial Implementation

def kmeans(data, k, num_iter=50):

n, d = data.shape

centroids = data[np.random.choice(n, k, replace=False)] # (k, d)

labels = np.zeros(n) # (n,)

for _ in range(num_iter):

# ASSIGNMENT STEP

for i, point in enumerate(data):

distances = [np.linalg.norm(point - c) for c in centroids]

labels[i] = np.argmin(distances) # idx of closest centroid

# UPDATE STEP (average within each group)

centroids = np.stack([data[labels==i].mean(axis=0)

for i in range(k)]) # (k, d)

return centroids

22 of 87

22

Setup Steps

def kmeans(data, k, num_iter=50):

n, d = data.shape

centroids = data[np.random.choice(n, k, replace=False)] # (k, d)

labels = np.zeros(n) # (n,)

for _ in range(num_iter):

# ASSIGNMENT STEP

for i, point in enumerate(data):

distances = [np.linalg.norm(point - c) for c in centroids]

labels[i] = np.argmin(distances) # idx of closest centroid

# UPDATE STEP (average within each group)

centroids = np.stack([data[labels==i].mean(axis=0)

for i in range(k)]) # (k, d)

return centroids

23 of 87

23

Fixed Iteration (instead of stop cond.)

def kmeans(data, k, num_iter=50):

n, d = data.shape

centroids = data[np.random.choice(n, k, replace=False)] # (k, d)

labels = np.zeros(n) # (n,)

for _ in range(num_iter):

# ASSIGNMENT STEP

for i, point in enumerate(data):

distances = [np.linalg.norm(point - c) for c in centroids]

labels[i] = np.argmin(distances) # idx of closest centroid

# UPDATE STEP (average within each group)

centroids = np.stack([data[labels==i].mean(axis=0)

for i in range(k)]) # (k, d)

return centroids

24 of 87

24

Assignment Step

def kmeans(data, k, num_iter=50):

n, d = data.shape

centroids = data[np.random.choice(n, k, replace=False)] # (k, d)

labels = np.zeros(n) # (n,)

for _ in range(num_iter):

# ASSIGNMENT STEP

for i, point in enumerate(data):

distances = [np.linalg.norm(point - c) for c in centroids]

labels[i] = np.argmin(distances) # idx of closest centroid

# UPDATE STEP (average within each group)

centroids = np.stack([data[labels==i].mean(axis=0)

for i in range(k)]) # (k, d)

return centroids

25 of 87

25

Update Step

def kmeans(data, k, num_iter=50):

n, d = data.shape

centroids = data[np.random.choice(n, k, replace=False)] # (k, d)

labels = np.zeros(n) # (n,)

for _ in range(num_iter):

# ASSIGNMENT STEP

for i, point in enumerate(data):

distances = [np.linalg.norm(point - c) for c in centroids]

labels[i] = np.argmin(distances) # idx of closest centroid

# UPDATE STEP (average within each group)

centroids = np.stack([data[labels==i].mean(axis=0)

for i in range(k)]) # (k, d)

return centroids

26 of 87

26

Return

def kmeans(data, k, num_iter=50):

n, d = data.shape

centroids = data[np.random.choice(n, k, replace=False)] # (k, d)

labels = np.zeros(n) # (n,)

for _ in range(num_iter):

# ASSIGNMENT STEP

for i, point in enumerate(data):

distances = [np.linalg.norm(point - c) for c in centroids]

labels[i] = np.argmin(distances) # idx of closest centroid

# UPDATE STEP (average within each group)

centroids = np.stack([data[labels==i].mean(axis=0)

for i in range(k)]) # (k, d)

return centroids

27 of 87

27

Refactored Implementation

def kmeans(data, k, num_iter=50):

n, d = data.shape

centroids = data[np.random.choice(n, k, replace=False)] # (k, d)

for _ in range(num_iter):

labels = assignment_step(data, centroids) # (n,)

centroids = update_step(data, labels, k) # (k, d)

return centroids��n, k, d = 5000, 26, 26

data = np.random.uniform(size=(n, d)) # dummy data

%timeit kmeans(data, k) # magic function from Jupyter Notebooks

>> 1 loop, best of 5: 51.7 s per loop

28 of 87

28

Try it out!

��Slideshow: bit.ly/optimize-numpy

Notebook: bit.ly/optimize-numpy-nb

29 of 87

29

Audience Q&A Session

Start presenting to display the audience questions on this slide.

30 of 87

30

How fast were you able to make kmeans/assignment_step/update_step?

Start presenting to display the poll results on this slide.

31 of 87

Optimizing the Assignment Step

31

32 of 87

32

Original Assignment Step

def assignment_step_v1(data, centroids):

labels = np.empty(data.shape[0]) # (n,)

for i, point in enumerate(data):

distances = [np.linalg.norm(point - c) for c in centroids]

labels[i] = np.argmin(distances)

return labels

centroids = data[np.random.choice(n, k, replace=False)] # (k, d)

%timeit assignment_step_v1(data, centroids)

>> 1 loop, best of 5: 1.01 s per loop

33 of 87

33

Original Assignment Step

def assignment_step_v1(data, centroids):

labels = np.empty(data.shape[0]) # (n,)

for i, point in enumerate(data):

distances = [np.linalg.norm(point - c) for c in centroids]

labels[i] = np.argmin(distances)

return labels

%timeit assignment_step_v1(data, centroids)

>> 1 loop, best of 5: 1.01 s per loop

34 of 87

34

Broadcasting Distance Computation

def assignment_step_v2(data, centroids):

labels = np.empty(data.shape[0]) # (n,)

for i, point in enumerate(data):

distances = np.linalg.norm(point - centroids, axis=1) # (k,)

labels[i] = np.argmin(distances)

return labels

%timeit assignment_step_v2(data, centroids)

>> 10 loops, best of 5: 72.6 ms per loop

35 of 87

35

What is “Broadcasting”?

  • How NumPy does operations with arrays of different shapes

36 of 87

36

What is “Broadcasting”?

  • How NumPy does operations with arrays of different shapes

37 of 87

37

Broadcasting Rules

e.g.

a: 4 x 3

b: 3

38 of 87

38

Broadcasting Rules

  1. Right-align the shapes

e.g.

a: 4 x 3

b: 3

39 of 87

39

Broadcasting Rules

  • Right-align the shapes
  • Left-pad shorter shapes with 1s

e.g.

a: 4 x 3

b: 1 x 3

40 of 87

40

Broadcasting Rules

  • Right-align the shapes
  • Left-pad shorter shapes with 1s
  • Ensure each corresponding dim either matches OR has 1s

e.g.

a: 4 x 3

b: 1 x 3

41 of 87

41

Broadcasting Rules

  • Right-align the shapes
  • Left-pad shorter shapes with 1s
  • Ensure each corresponding dim either matches OR has 1s
  • “Stretch” dimensions of size 1 to match other arrays

e.g.

a: 4 x 3

b: 4 x 3

42 of 87

42

Broadcasting Fail Example

  • Right-align the shapes
  • Left-pad shorter shapes with 1s
  • Ensure each corresponding dim either matches OR has 1s

e.g.

a: 4 x 3

b: 1 x 4

43 of 87

43

Broadcasting Example 2

e.g.

a: 4 x 1

b: 3

44 of 87

44

Broadcasting Example 2

  • Right-align the shapes

e.g.

a: 4 x 1

b: 3

45 of 87

45

Broadcasting Example 2

  • Right-align the shapes
  • Left-pad shorter shapes with 1s

e.g.

a: 4 x 1

b: 1 x 3

46 of 87

46

Broadcasting Example 2

  • Right-align the shapes
  • Left-pad shorter shapes with 1s
  • Ensure each corresponding dim either matches OR has 1s

e.g.

a: 4 x 1

b: 1 x 3

47 of 87

47

Broadcasting Example 2

  • Right-align the shapes
  • Left-pad shorter shapes with 1s
  • Ensure each corresponding dim either matches OR has 1s
  • “Stretch” dimensions of size 1 to match other arrays

e.g.

a: 4 x 3

b: 4 x 3

48 of 87

48

Audience Q&A Session

Start presenting to display the audience questions on this slide.

49 of 87

49

Broadcasting Distance Computation

def assignment_step_v1(data, centroids):

labels = np.empty(data.shape[0]) # (n,)

for i, point in enumerate(data):

distances = np.linalg.norm(point - centroids, axis=1) # (k,)

labels[i] = np.argmin(distances)

return labels

point : d → d → 1 x d → k x d

centroids: k x d → k x d → k x d → k x d

50 of 87

50

Broadcasting Distance Computation

def assignment_step_v1(data, centroids):

labels = np.empty(data.shape[0]) # (n,)

for i, point in enumerate(data):

distances = np.linalg.norm(point - centroids, axis=1) # (k,)

labels[i] = np.argmin(distances)

return labels

point - centroids: k x d

  • np.linalg.norm(..., axis=1) treats this as k different d dimensional arrays, returning k different norms

51 of 87

51

Assignment Step v2: Some Broadcasting

def assignment_step_v2(data, centroids):

labels = np.empty(data.shape[0]) # (n,)

for i, point in enumerate(data):

distances = np.linalg.norm(point - centroids, axis=1) # (k,)

labels[i] = np.argmin(distances)

return labels

%timeit assignment_step_v2(data, centroids)

>> 10 loops, best of 5: 72.6 ms per loop

52 of 87

52

Assignment Step v3

def assignment_step_v3(data, centroids):

diff = data[:, None] - centroids[None] # (n, k, d)

distances = np.linalg.norm(diff, axis=2) # (n, k)

labels = np.argmin(distances, axis=1) # (n,)

return labels

%timeit assignment_step_v3(data, centroids)

>> 100 loops, best of 5: 18.1 ms per loop

53 of 87

53

Assignment Step v3

def assignment_step_v3(data, centroids):

diff = data[:, None] - centroids[None] # (n, k, d)

distances = np.linalg.norm(diff, axis=2) # (n, k)

labels = np.argmin(distances, axis=1) # (n,)

return labels

%timeit assignment_step_v3(data, centroids)

>> 100 loops, best of 5: 18.1 ms per loop

Note:

data : n x d data[:, None] : n x 1 x d

centroids: k x d centroids[None]: 1 x k x d

54 of 87

54

Audience Q&A Session

Start presenting to display the audience questions on this slide.

55 of 87

55

Assignment Step v3

def assignment_step_v3(data, centroids):

diff = data[:, None] - centroids[None] # (n, k, d)

distances = np.linalg.norm(diff, axis=2) # (n, k)

labels = np.argmin(distances, axis=1) # (n,)

return labels

%timeit assignment_step_v3(data, centroids)

>> 100 loops, best of 5: 18.1 ms per loop

56 of 87

56

Assignment Step v3 - Improving Norm

def assignment_step_v3(data, centroids):

diff = data[:, None] - centroids[None] # (n, k, d)

distances = np.linalg.norm(diff, axis=2) # (n, k)

labels = np.argmin(distances, axis=1) # (n,)

return labels

57 of 87

57

Assignment Step v3 - Improving Norm

def assignment_step_v3(data, centroids):

diff = data[:, None] - centroids[None] # (n, k, d)

distances = np.sqrt((diff**2).sum(axis=2)) # (n, k)

labels = np.argmin(distances, axis=1) # (n,)

return labels

Do we need that square root?

58 of 87

58

Assignment Step v4 - Improving Norm

def assignment_step_v4(data, centroids):

diff = data[:, None] - centroids[None] # (n, k, d)

distances = (diff**2).sum(axis=2) # (n, k)

labels = np.argmin(distances, axis=1) # (n,)

return labels

Do we need that square root? No, it is monotonic so doesn’t affect the argmin.

>> 100 loops, best of 5: 17.8 ms per loop

59 of 87

59

Assignment Step v5 - Einsum

def assignment_step_v4(data, centroids):

diff = data[:, None] - centroids[None] # (n, k, d)

distances = np.einsum('nkd,nkd->nk', diff, diff) # (n, k)

labels = np.argmin(distances, axis=1) # (n,)

return labels

>> 100 loops, best of 5: 9.94 ms per loop

60 of 87

60

What is Einsum?

  • A compact way to express products and sums of arrays�
  • Advantages
    • Avoid creating intermediate arrays
    • Improve readability of complex operations
    • Automatically optimize order of operations�
  • Out of scope for this workshop 😬

61 of 87

61

Audience Q&A Session

Start presenting to display the audience questions on this slide.

62 of 87

Optimizing the Update Step

62

63 of 87

63

Original Update Step

def update_step_original(data, labels, k):

return np.stack([data[labels==i].mean(axis=0) for i in range(k)])

labels = np.random.randint(0, k, size=n) # dummy labels

%timeit update_step_original(data, labels, k)

>> 1000 loops, best of 5: 1.15 ms per loop

64 of 87

64

Original Update Step - Time Complexity

def update_step_v1(data, labels, k): # returns centroids

return np.stack([data[labels==i].mean(axis=0) for i in range(k)])

labels = np.random.randint(0, k, size=n) # dummy labels

%timeit update_step_v1(data, labels, k)

>> 1000 loops, best of 5: 1.15 ms per loop

Time Complexity:

labels==i → O(n)

data[labels==i] → O(nd)

data[labels==i].mean(axis=0) → O(nd)

[data[labels==i].mean(axis=0) for i in range(k)] → O(ndk)

65 of 87

65

One-hot matrix

  • labels contain the centroid index for each datapoint
  • Want matrix with a row per datapoint that is “one-hot”

labels

>> [1, 0, 2, 2, 1]

np.eye(k)

>> [[1. 0. 0.]

[0. 1. 0.]

[0. 0. 1.]]

66 of 87

66

One-hot matrix

  • labels contain the centroid index for each datapoint
  • Want matrix with a row per datapoint that is “one-hot”

labels

>> [1, 0, 2, 2, 1]

np.eye(k)

>> [[1. 0. 0.]

[0. 1. 0.]

[0. 0. 1.]]

np.eye(k)[labels]

>> [[0. 1. 0.]

[1. 0. 0.]

[0. 0. 1.]

[0. 0. 1.]

[0. 1. 0.]]

67 of 87

67

One-hot matrix

  • labels contain the centroid index for each datapoint
  • Want matrix with a row per datapoint that is “one-hot”

labels

>> [1, 0, 2, 2, 1]

np.eye(k)

>> [[1. 0. 0.]

[0. 1. 0.]

[0. 0. 1.]]

np.eye(k)[labels] # onehot

>> [[0. 1. 0.]

[1. 0. 0.]

[0. 0. 1.]

[0. 0. 1.]

[0. 1. 0.]]

68 of 87

68

One-hot matrix

  • labels contain the centroid index for each datapoint
  • Want matrix with a row per datapoint that is “one-hot”

labels

>> [1, 0, 2, 2, 1]

np.eye(k)

>> [[1. 0. 0.]

[0. 1. 0.]

[0. 0. 1.]]

np.eye(k)[labels] # onehot

>> [[0. 1. 0.]

[1. 0. 0.]

[0. 0. 1.]

[0. 0. 1.]

[0. 1. 0.]]

69 of 87

69

One-hot matrix

  • labels contain the centroid index for each datapoint
  • Want matrix with a row per datapoint that is “one-hot”

labels

>> [1, 0, 2, 2, 1]

np.eye(k)

>> [[1. 0. 0.]

[0. 1. 0.]

[0. 0. 1.]]

np.eye(k)[labels] # onehot

>> [[0. 1. 0.]

[1. 0. 0.]

[0. 0. 1.]

[0. 0. 1.]

[0. 1. 0.]]

70 of 87

70

One-hot matrix

  • labels contain the centroid index for each datapoint
  • Want matrix with a row per datapoint that is “one-hot”

labels

>> [1, 0, 2, 2, 1]

np.eye(k)

>> [[1. 0. 0.]

[0. 1. 0.]

[0. 0. 1.]]

np.eye(k)[labels] # onehot

>> [[0. 1. 0.]

[1. 0. 0.]

[0. 0. 1.]

[0. 0. 1.]

[0. 1. 0.]]

71 of 87

71

One-hot matrix

  • labels contain the centroid index for each datapoint
  • Want matrix with a row per datapoint that is “one-hot”

labels

>> [1, 0, 2, 2, 1]

np.eye(k)

>> [[1. 0. 0.]

[0. 1. 0.]

[0. 0. 1.]]

np.eye(k)[labels] # onehot

>> [[0. 1. 0.]

[1. 0. 0.]

[0. 0. 1.]

[0. 0. 1.]

[0. 1. 0.]]

72 of 87

72

Sum by Centroid Group (n=5, d=2, k=3)

onehot.T @ data = sum_by_cluster

[[8. 0.]

[[0. 1. 0. 0. 0.] [8. 6.] [[ 8. 6.]

[1. 0. 0. 0. 1.] @ [1. 7.] = [17. 3.]

[0. 0. 1. 1. 0.]] [5. 8.] [ 6. 15.]]

[9. 3.]]

Recall: labels = [1, 0, 2, 2, 1]

73 of 87

73

Sum by Centroid Group (n=5, d=2, k=3)

onehot.T @ data = sum_by_cluster

[[8. 0.]

[[0. 1. 0. 0. 0.] [8. 6.] [[ 8. 6.]

[1. 0. 0. 0. 1.] @ [1. 7.] = [17. 3.]

[0. 0. 1. 1. 0.]] [5. 8.] [ 6. 15.]]

[9. 3.]]

Recall: labels = [1, 0, 2, 2, 1]

74 of 87

74

Sum by Centroid Group (n=5, d=2, k=3)

onehot.T @ data = sum_by_cluster

[[8. 0.]

[[0. 1. 0. 0. 0.] [8. 6.] [[ 8. 6.]

[1. 0. 0. 0. 1.] @ [1. 7.] = [17. 3.]

[0. 0. 1. 1. 0.]] [5. 8.] [ 6. 15.]]

[9. 3.]]

Recall: labels = [1, 0, 2, 2, 1]

75 of 87

75

Sum by Centroid Group (n=5, d=2, k=3)

onehot.T @ data = sum_by_cluster

[[8. 0.]

[[0. 1. 0. 0. 0.] [8. 6.] [[ 8. 6.]

[1. 0. 0. 0. 1.] @ [1. 7.] = [17. 3.]

[0. 0. 1. 1. 0.]] [5. 8.] [ 6. 15.]]

[9. 3.]]

Recall: labels = [1, 0, 2, 2, 1]

76 of 87

76

Sum by Centroid Group (n=5, d=2, k=3)

onehot.T @ data = sum_by_cluster

[[8. 0.]

[[0. 1. 0. 0. 0.] [8. 6.] [[ 8. 6.]

[1. 0. 0. 0. 1.] @ [1. 7.] = [17. 3.]

[0. 0. 1. 1. 0.]] [5. 8.] [ 6. 15.]]

[9. 3.]]

Recall: labels = [1, 0, 2, 2, 1]

77 of 87

77

Number of Points per Cluster

  • onehot.sum(axis=0) → [1., 2., 2.]
    • O(nk), since we sum up n elements for each of k centroids�
  • np.bincount(labels, minlength=k) → [1., 2., 2.]
    • O(n + k), we allocate the length k array once, then iterate through labels once

78 of 87

78

Putting it all together…

def update_step_matmul(data, labels, k): # returns centroids

onehot = np.eye(k)[labels] # (n, k)

cluster_sums = onehot.T @ data # (k, d)

cluster_counts = np.bincount(labels, minlength=k)[:, None] # (k, 1)

return cluster_sums / cluster_counts # (k, d)

%timeit update_step_matmul(data, labels, k)

>> 1000 loops, best of 5: 1.03 ms per loop

79 of 87

79

Putting it all together…

def update_step_matmul(data, labels, k): # returns centroids

onehot = np.eye(k)[labels] # (n, k)

cluster_sums = onehot.T @ data # (k, d)

cluster_counts = np.bincount(labels, minlength=k)[:, None] # (k, 1)

return cluster_sums / cluster_counts # (k, d)

%timeit update_step_matmul(data, labels, k)

>> 1000 loops, best of 5: 1.03 ms per loop

Tiny improvement…😔

Time complexity still O(ndk) due to the matmul

80 of 87

80

Audience Q&A Session

Start presenting to display the audience questions on this slide.

81 of 87

81

What’s now?

  • onehot only has n non-zero values
    • but our matmul uses all n*k values�
  • How can we multiply only non-zero values? Sparse matrices!

82 of 87

82

Sparse Matrices

  • Sparse matrices (arrays) store non-zero values and the positions those non-zero values belong in the full matrix (array)�
  • Many in SciPy optimized for diff. tasks, we care about speed:
    • Compressed Sparse Row (CSR)
    • Compressed Sparse Row (CSC)�
  • CSR more space efficient w/ fewer rows; CSC fewer columns
    • Hard to tell which will be faster for a task--BENCHMARK!

83 of 87

83

Sparse

from scipy import sparse

def update_step_csc(data, labels, k):

n, d = data.shape

# constructor: (values, (row_indices, column_indices))

onehotT = sparse.csc_matrix(

(np.ones(n), (labels, np.arange(n))), shape=(k, n))

group_counts = np.bincount(labels, minlength=k)[:, None] # (k, 1)

return onehotT @ data / group_counts

%timeit update_step_csc(data, labels, k)

>> 1000 loops, best of 5: 491 µs per loop

Time complexity: O(d(n+k))

84 of 87

84

Review

  • Broadcasting can help us vectorize loops in NumPy�
  • Using mathematical tricks can either reduce work or help vectorize
  • Improving the asymptotic performance has the biggest impact, and utilities like Sparse matrices can help with that

85 of 87

85

Audience Q&A Session

Start presenting to display the audience questions on this slide.

86 of 87

86

Resources for Further Learning

87 of 87

Thanks for coming!��Fill out the feedback form to be entered in a raffle + get food! bit.ly/w22-raffle-feedback-form

87