1 of 51

Simplicity

Joel Grus

Southern Data Science Conference 2022

2 of 51

3 of 51

Simplicity

4 of 51

What's the simplest thing that might possibly work?

5 of 51

What's the simplest thing that might possibly work? And why didn't you try that first?

6 of 51

Simplicity

7 of 51

"There's a lot to love about the ML community, but one thing I don't love is the cult of complexity where you get respect by using big equations, big words, and big models."

https://twitter.com/_brohrer_/status/1553811938886520842

8 of 51

“Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better.”

– Edsger W. Dijkstra

9 of 51

"Simple solutions are easier to implement and scale than complex solutions. .... We need to start acknowledging the power of simple."

https://twitter.com/bryanl/status/1555619528096235525

10 of 51

11 of 51

12 of 51

13 of 51

14 of 51

15 of 51

Complexity?

16 of 51

Model Complexity

17 of 51

https://eugeneyan.com/writing/simplicity/

18 of 51

"Seek simplicity, and distrust it."

– Alfred North Whitehead

19 of 51

Text Classification

20 of 51

Naive Bayes model

chop email into words
P(spam|words) ∝ P(words|spam)P(spam)
= P(word1|spam) … P(wordn|spam) P(spam)

2n + 2 parameters

21 of 51

Logistic Regression model

convert email to feature vector x1 … xn
fit model
log (p/(1-p)) = b0 + b1 x1 + … bn xn

(n + 1) parameters

22 of 51

LSTM Model

tokenize email and convert to a sequence of embeddings
run the embeddings through an LSTM and get a final "state"
learn to classify the final hidden state

thousands of parameters*

23 of 51

BERT model

convert email to

wordpiece embeddings
segment embeddings
positional embeddings

feed to transformer model
use pretrained final [CLS] embedding as input to classifier
fine-tune

110M parameters

24 of 51

Which is simplest?

Naive Bayes
Logistic Regression
LSTM
BERT

25 of 51

Which would you recommend?

Naive Bayes
Logistic Regression
LSTM
BERT

26 of 51

Which would you recommend?

Naive Bayes
Logistic Regression
LSTM
BERT

27 of 51

Free Riding

LSTM Model

GLoVe vectors

so let's take a step back

you'll remember that I put an asterisk by the GLoVe plus LSTM model having "thousands" of parameters

the reason that LSTM model only has thousands of parameters is that it's "free riding" off of the text embedding model

if you actually download all the glove vectors you're looking at hundreds of megabytes of data

do they count for the complexity of that model?

so in some sense you've got millions of parameters, but most of them are fixed

someone had to learn those vectors once. but you can use them over and over again for different problems, just changing your LSTM parameters here and there

in some sense they're a black box that's not part of our model

do they count as "complexity"? they're pretty complex

or do they count as "simplicity" because they move complexity out of our model?

28 of 51

BERT model

convert email to

wordpiece embeddings
segment embeddings
positional embeddings

feed to transformer model
use pretrained final [CLS] embedding as input to classifier
fine-tune

110M parameters

29 of 51

BERT model

convert email to

wordpiece embeddings
segment embeddings
positional embeddings

feed to transformer model
use pretrained final [CLS] embedding as input to classifier
fine-tune

110M parameters

compute [CLS] embedding

thousands of parameters?

30 of 51

Hidden Complexity

31 of 51

"Out of clutter, find simplicity."

– Albert Einstein

32 of 51

A digression:

Woodworking

33 of 51

34 of 51

35 of 51

36 of 51

37 of 51

38 of 51

BERT in 2018

39 of 51

BERT 2022

from transformers import (

AutoTokenizer,

DataCollatorWithPadding,

AutoModelForSequenceClassification,

TrainingArguments,

Trainer

)

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def preprocess_function(examples):

return tokenizer(examples["text"], truncation=True)

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

training_args = TrainingArguments(...)

trainer = Trainer(

model=model,

args=training_args,

train_dataset=train_dataset,

eval_dataset=eval_dataset,

tokenizer=tokenizer,

data_collator=data_collator,

)

trainer.train()

40 of 51

Another example:

Sorting

41 of 51

As our tools get better, the boundary between "simple" and "complex" changes.

42 of 51

"Simple solutions are easier to implement and scale than complex solutions. .... We need to start acknowledging the power of simple."

https://twitter.com/bryanl/status/1555619528096235525

43 of 51

"I’m often asked: Why use AngelList to run a fund?

My answer: AngelList abstracts away *all* the complexity associated w/ starting & running a fund"

https://twitter.com/avlok/status/1564350808002478080

44 of 51

Abstract Away the Complexity

45 of 51

Create Jigs to Abstract Away the Complexity

pretrained models
modeling libraries
project templates
clean APIs
best practices
shared processes

46 of 51

"Manifest plainness,

Embrace simplicity,

Reduce selfishness,

Have few desires."

– Lao-tzu

47 of 51

Manifest plainness

48 of 51

Reduce selfishness

49 of 51

Have few desires

50 of 51

Embrace simplicity

51 of 51

Thanks!