Simplicity
Joel Grus
Southern Data Science Conference 2022
Simplicity
What's the simplest thing that might possibly work?
What's the simplest thing that might possibly work? And why didn't you try that first?
Simplicity
"There's a lot to love about the ML community, but one thing I don't love is the cult of complexity where you get respect by using big equations, big words, and big models."
āSimplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better.ā
ā Edsger W. Dijkstra
"Simple solutions are easier to implement and scale than complex solutions. .... We need to start acknowledging the power of simple."
Complexity?
Model Complexity
"Seek simplicity, and distrust it."
ā Alfred North Whitehead
Text Classification
Naive Bayes model
Logistic Regression model
LSTM Model
BERT model
Which is simplest?
Which would you recommend?
Which would you recommend?
Free Riding
LSTM Model
GLoVe vectors
BERT model
BERT model
compute [CLS] embedding
thousands of parameters?
Hidden Complexity
"Out of clutter, find simplicity."
ā Albert Einstein
A digression:
Woodworking
BERT in 2018
BERT 2022
from transformers import (
AutoTokenizer,
DataCollatorWithPadding,
AutoModelForSequenceClassification,
TrainingArguments,
Trainer
)
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
def preprocess_function(examples):
return tokenizer(examples["text"], truncation=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
training_args = TrainingArguments(...)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
data_collator=data_collator,
)
trainer.train()
Another example:
Sorting
As our tools get better, the boundary between "simple" and "complex" changes.
"Simple solutions are easier to implement and scale than complex solutions. .... We need to start acknowledging the power of simple."
"Iām often asked: Why use AngelList to run a fund?
My answer: AngelList abstracts away *all* the complexity associated w/ starting & running a fund"
Abstract Away the Complexity
Create Jigs to Abstract Away the Complexity
"Manifest plainness,
Embrace simplicity,
Reduce selfishness,
Have few desires."
ā Lao-tzu
Manifest plainness
Reduce selfishness
Have few desires
Embrace simplicity
Thanks!