Demystifying Data Science
Joel Grus
@joelgrus
who am I?
research engineer at ai2
previously
wrote a book!
co-host a podcast
please listen to it!
write a blog
How did I become a data scientist?
NOW I'M A DATA SCIENTIST
TRY TO GET SPARK TO WORK
TRY TO GET MYSQL TO WORK
TRY TO GET R TO WORK
TRY TO GET AWK TO WORK
TRY TO GET SCIKIT TO WORK
TRY TO GET TENSORFLOW TO WORK
TRY TO GET HADOOP TO WORK
TRY TO GET MATPLOTLIB TO WORK
TRY TO GET DOCKER TO WORK
TRY TO GET D3 TO WORK
SAY "BIG DATA"
TWEET
ATTEND STRATA
SCRAPE AMAZON, GET BANNED
how did data science become me?
Grad School
2010
2011
what do I do on a daily basis?
but what do i DO on a daily basis?
Ai2 ("Research engineer")
google ("Software engineer")
volometrix ("Chief scientist")
decide ("Analyst")
farecast ("analyst / fareologist")
hedge fund ("Senior analyst")
grad school ("grad student")
what is data science like in practice?
heuristic/joke:
two types of data scientists
Type A - the analyst
what's a unit test?
type b - the builder
"what's a train-test split?"
what's a train-test split?
funny, but doesn't give you the full picture
type c - the conformist
I read on hacker news that everyone is using keras, maybe we should too
type d - the dEEP learner
keras or gtfo
type E - the educator
you're using keras all wrong, give me the keyboard
type f - the failure
I can't get keras to work
type g - the go-getter
I signed up for the keras MOOC and like 10 other MOOCs too
type h - the hater
I hate keras
type i - the inventor
here's my new library, I call it keras
type J - the jerk
maybe I should just put a keras joke on every slide
type k - the kaggler
my first attempt only got me to 61%, but then I stayed up all night for a week renting GPU instances on Amazon, and now I'm getting close to breaking into the top 100. that will get me a job, right?
type L - the lifer
I've been doing data science since before data science was even a thing!
type M - the moocher
hey, can I use your Spark cluster?
type N - the nerd
did you see the new paper on dynamic adversarial generalized deep recurrent reinforcement memory networks on arxiv?
type O - the overqualified
I printed out that pie chart you wanted, it's over there next to my Physics PhD
type P - the p-hacker
sure, the first 19 results weren't significant at the 5% level, but...
type Q - the questioner
why would you categorize data scientists when "data science" is supposed to be an umbrella term?
type R - the R User
set_nas <- function(x) ifelse(is.na(x) | !str_detect(x, "SP"), NA, x)
type s - the self-promoter
hey, have you bought my book and listened to my podcast and read my blog and followed me on twitter?
type T - the thOUGHT-LEADER
there are 26 types of data scientists, as you can see in this combination venn diagram / gartner hype cycle
type U - the unicorn
the most important skill for a data scientist is empathy!
type v - the venn-diagram sharer
type W - the worrier
oh god, what if everyone understands deep learning except me?
type X - the xenophobe
nothing against scikit-learn, I just feel more comfortable using my own implementations
type Y - the Yeller
DATA SCIENCE IS THE SEXIEST JOB OF THE 21ST CENTURY! #Data #DataScience #Analytics #BigData #Innovation
type Z - the zookeeper
I'm proficient in Hadoop, Pig, Python, Pandas, Anaconda, Hive, Ant, Giraph, Oozie, Capybara, Orangutan, Coelacanth, ...
in practice, data science is like a lot of different things!
should we categorize data scientists?
think of it as "clustering" instead
type s - the self-promoter
hey, have you bought my book and listened to my podcast and read my blog and followed my twitter?
THANKS!
joelgrus.com
@joelgrus