1 of 8

Jupyter Notebook:��Considered Harmful

Maksim Tsvetovat • 06.28.2019

2 of 8

The Promise

Create and Share Documents

that contain live code, equations, visualizations and narrative text

Experiment Effortlessly

  • Test and document code quickly
  • Iterate, experiment, play with ideas

Lower barrier to entry

Does not require one to set up a full development environment, learn shell commands, deal with “old school” command line interfaces

3 of 8

Grim Reality, Part 1

Spaghetti Code

  • Notebook cells provide an illusion of structured code
  • Underlying code (with notebook cells removed) is still sequential and unstructured

Out-of-order execution

  • Experimental nature of notebooks encourages executing notebook cells out of order

4 of 8

Grim Reality, Part 2

Global Variables

  • All variables are global;
  • Documentation of variable types and purpose is not encouraged
  • Use of functions and local scope is not encouraged

No modularity

  • Most Notebook users do not know how to create a Python module or a class
  • Separation of concerns and separation of scope is not encouraged

5 of 8

Grim Reality: Education & Jobs

Coding Schools

  • Data science coding schools DO NOT teach structured programming practices
  • Entire course is spent making spaghetti code in notebooks

Spoiling a good Developer

  • When organization insists on using Notebooks, developers are discouraged from writing modules and classes and using proper engineering practices
  • People can be FIRED for writing structured code!

6 of 8

RESULT:

An entire generation of developers has NO IDEA what well structured code looks like

7 of 8

Implication for CTOs / Data Science Mgt

  1. Code coming out of the data science team most likely is poorly engineered, with “interesting” and “unusual” bugs
  2. Notebooks encourage documentation, but not require it -- most likely, it’ll be as poorly documented as code coming from the dev team
  3. ZERO test coverage
  4. UPSHOT: Code created in notebooks should be rewritten from scratch before it goes in production

8 of 8

Maybe….

Teach structured code in the first place and do it right the first time?