1 of 48

Right Code

Right Place

Right Time

Tim Hopper

Senior Data Scientist

Cylance, Inc

🐦 @tdhopper

💻 tdhopper.com

📧 tdhopper@gmail.com

🖥 bit.ly/pydata2018

2 of 48

3 of 48

What 2010 Tim thought I’d do

4 of 48

What my wife thinks I do

5 of 48

What my CEO thinks I do

6 of 48

What my boss thinks I do

7 of 48

What I want people on Twitter to think I do

8 of 48

What I actually do

9 of 48

Install Python dependencies for exploratory analysis

10 of 48

Filter and extra features from data snapshots on S3

11 of 48

Spin up AWS spot instance with enough RAM to load data into Pandas

12 of 48

Configure EC2 instances and VPN to make Jupyter server accessible locally

13 of 48

Translate Scikit results into form that can be re-implemented in production

14 of 48

Extract code from random scripts and notebooks into Python package

15 of 48

Write shell script to bootstrap EC2 machines to reproduce analysis

16 of 48

Move Python environments inside Docker containers

17 of 48

Figure out how to share Docker images with the rest of the team

18 of 48

Build dashboard to monitor performance of model predictions

19 of 48

Schedule reporting tasks to run nightly

20 of 48

Do whatever people do with Kubernetes

21 of 48

Configure AWS permissions

22 of 48

23 of 48

Data Scientists Solve Problems

24 of 48

https://www.youtube.com/watch?v=Av07QiqmsoA

25 of 48

VPs of Engineering Don’t Want �Data Scientists Being Engineers

26 of 48

Production models

Model building and deployment pipelines

27 of 48

Great Data Science

Needs

Great Engineering

28 of 48

29 of 48

Great Engineering

Needs

Infrastructure

and Operations

30 of 48

31 of 48

DevOps Teams

Can Hinder

DevOps Practice

32 of 48

33 of 48

34 of 48

DataSciDevOps?

(MLEngOps?)

35 of 48

Great Machine Learning

Requires

Great Engineeringand� Great Operations

36 of 48

Tim Hopper

Senior Data Scientist

Cylance, Inc.

🐦 @tdhopper

💻 tdhopper.com

📧 tdhopper@gmail.com

🖥 bit.ly/pydata2018

?

37 of 48

38 of 48

Right Code

Right Place

Right Time

39 of 48

Is my code correct?

Right Code

40 of 48

Are my dependencies available?

Right Code

41 of 48

Is my configuration correct?

Right Code

42 of 48

Are internal libraries readily available to coworkers?

Right Place

43 of 48

Are deployments automated?

Right Place

44 of 48

Is my virtual network correctly configured?

Right Place

45 of 48

Right Place

Is my configuration and provisioning automated?

46 of 48

Can I easily run code on a schedule?

Right Time

47 of 48

Do I have visibility into its status and history?

Right Time

48 of 48

Can I provision infrastructure on-demand for ad-hoc jobs?

Right Time