1 of 57

Contributing to Open Source

By Ravin Kumar

2 of 57

What we’ll cover

  • My open source story
  • What is open source
  • Your open source story
  • Ways to contribute
  • Technical (code) contributions
    • Finding a project
    • Issue Tickets
    • Your first PR
    • Local Dev environment
    • Code quality tools
    • Continuous Integration

3 of 57

Warning - These slides will go by fast

  • Presentation will be short (~30 minutes)
    • Treat them a take home references, not as comprehensive notes
    • They’re meant to be googleable
  • Live coding! (30 minutes)
    • And questions!

4 of 57

My story

  • Open Source aficionado
  • Data person
  • Worked at SpaceX doing data stuff
  • Open Source contributor
    • Core Contributor to Arviz, PyMC3, and PyMC4
  • Twitter: @canyon289
    • Slides are here!
  • Github: https://github.com/canyon289
  • Blog: http://canyon289.github.io/

5 of 57

What is Open Source?

6 of 57

What is open source?

  • Source that anyone can read
    • And usually modify and distribute software freely
  • A community of people dedicated to sharing ideas and open to colloborating with others

7 of 57

Notable Open Source Software (OSS)

  • Operating Systems
    • Linux
    • FreeBSD
  • Computer Languages
    • Python
    • Perl
    • PHP
  • Basically every data science library (in python)
  • Basically every javascript framework

8 of 57

“Free as in speech not as in beer”

  • “Free software” is an ambiguous term
  • Free as in beer
    • Software you can run for free but not modify
    • Free trials
    • Free phone apps
  • Free as in speech
    • This is open source
    • Software “ideas” (where ideas are code) that are freely distributed
    • Open Source Software and speech are the same thing

9 of 57

Not just software

10 of 57

Your Story

11 of 57

Who’s contributed to open source?

Who’s wanted to contribute to open source?

Who can contribute to open source?

Everyone can contribute

(even if you don’t code)

12 of 57

Why Contribute?

  • It’s good for others
    • Helps others expand their knowledge and ability
    • Encourages open dialogue and exchange
    • People get to work with you!
  • It’s good for you
    • Great for building a profile
    • Practical way to level up in skills
    • Basically free
      • Definitely cheaper than traditional degrees or MOOCs
    • People get to know you

13 of 57

14 of 57

Ways to Contribute

15 of 57

Ways to contribute (that don’t require code)

  • Make logos (Graphic Design)
  • Design websites (Front End development)
  • Share on social media (Marketing)
  • How to guides, blog posts, books (Writing)
  • Proofread documentation (Editing)
  • Organize conferences and meetups, like pyladies (Event Coordination)
  • Give money ($$$$) (or ask employers to donate)
  • Say thank you to developers! (just being a great person)

16 of 57

17 of 57

We’re going to focus on two paths

  • Bug Reporting
    • Called Issue Tickets on github
  • Code Contributions
    • Contributing to existing projects
    • Open sourcing your own code
  • Warning! This talk’s examples are python, data science, git, github focused
    • The principles remain largely the same across tools, languages, and communities

18 of 57

Three example projects

  • PyMC3 - Mature project at end of life
  • ArviZ - Newer release with users (On the upswing)
  • PyMC4 - Experimental (We’re still figuring it out)

19 of 57

My Core Contributor Projects

  • Bayesian Exploration and Model Criticism library
  • 4 core contributors
  • NumFOCUS Affiliated
  • Python library

20 of 57

My Core Contributor Projects

  • Bayesian Inference Library
  • Based on Theano
  • Probabilistic Programming in Python
  • NumFOCUS Sponsored
  • Python library

21 of 57

My Core Contributor Projects

  • Bayesian Inference Library
  • Based on Tensorflow
  • Still in development

PyMC4

22 of 57

Bug Reporting

23 of 57

Bug reports are very valuable

  • Open source devs can’t test on all hardware
  • Bug reports help identify issues
  • We know people are using the library!
    • Definitely feels good

24 of 57

Writing a good bug report

  • Submit report through appropriate channel
    • On github that’s usually through and issue
  • Include all relevant sections
    • Brief description of what happened
    • Minimally reproducible example
    • What expected output should be
    • Versions of relevant software
  • Please be nice

https://github.com/canyon289/TwitterBotUSC/issues/2

25 of 57

Example of a bad bug report

  • Demanding tone
  • No way for developers to reproduce
  • No indication of what the expected result should be

https://github.com/canyon289/TwitterBotUSC/issues/1

26 of 57

How not to contribute

27 of 57

Code Contributions

28 of 57

Baseline Skills for Contributions (Once per lifetime)

In ranked order

  1. Command Line Git
  2. Ability to navigate code projects
  3. Knowledge of git workflows
  4. Awareness of software tooling

Note: Git is not the only version control system

but it is the most popular right now so we’ll focus on it

29 of 57

Starting on existing project (Once Per Project)

  • Fork repository
  • Clone your fork to your local machine
  • Setup local dev environment
  • Run tests to ensure everything worked
  • Add original project as a remote
    • git add remote original_project_url

30 of 57

Making regular contributions (Each Pull Request)

Every contribution follows this general pattern

  1. Fetch latest version
  2. Make a branch
  3. Make code changes on your computer
  4. Commit changes
  5. Push to your remote
  6. Make a Pull Request
  7. Other developers provide feedback
  8. Go back to step 3 based on feedback
    1. Skip step 6
  9. Core devs or maintainers merge changes
  10. Success!!

31 of 57

The Bimodal Distribution of emotions

Bad News

  • Pull requests require numerous steps across multiple tools

Good News

  • The tools makes coding much easier and headache free (mostly)
  • Same skills are needed for programming jobs
    • including enterprise grade data science

32 of 57

Baseline Skills

33 of 57

Most crucial skill - git

  • Git is the one thing other contributors can’t directly help you with
  • Get comfortable with git command line
    • Merge conflicts
    • Rebase
    • Working with multiple remotes
  • Specific Commands
    • Fetch
    • Pull
    • Merge
    • Rebase
    • Branch
    • Checkout

    • Commit
    • Add
    • Reset
    • Amend
    • Remote

34 of 57

35 of 57

My git tools

  • Git Command Line
    • Already installed on Linux
    • OSX and Windows usually need some extra setup
  • Git Kraken
    • Only for visualization of git graphs
    • Git platform independent
    • https://www.gitkraken.com/

36 of 57

Code Tooling

  • Understand and recognize toolset in typical open source repository (repo)
  • These are some of the python specific ones
    • Continuous Integration
      • TravisCI, CircleCI, Azure Pipelines
    • Testing
      • Unittest, pytest, nose
    • Linting
      • Pylint, pydocstyle, black
    • Documentation tooling
      • Sphinx, autodoc
    • Code Packaging and distribution
      • Setuptools, wheels, pypi
    • Environment Isolation
      • Conda, Docker, virtualenv, pipenv
    • Terminals
      • Bash, Windows shell, Mac terminal

37 of 57

Navigating Github

  1. Figure out what can be better.
    1. Look through
      1. Issue Tickets
      2. Documentation
      3. Code
  2. Setup a local dev environment that’s isolated from everything else
    • Good projects will have recommendations on how to do this
  3. Run the test suite. Be sure everything passes

38 of 57

Once per project skills

39 of 57

Making your first code contribution

  • Git clone library to your local computer
  • Setup a local dev environment
    • Good projects will have instructions on how to do this
  • Run the test suite. Be sure everything passes
    • This includes linting
  • Add the original github repo as a remote
    • git remote add upstream git:upstream_repo

40 of 57

Every Pull Request

41 of 57

Making your first code contribution

  • If working off existing bug report make a comment that you’re starting work
  • Make code changes
    • Write tests and documentation too
  • Push changes to your fork
  • Make a pull request
  • Wait for Continuous Integration pass
    • Fix issues if any come up
  • Wait for a code review from a human
    • Fix issues if any come up
  • Maintainers should merge!
  • Celebrate

42 of 57

The code process

43 of 57

Open sourcing your own library

44 of 57

Steps

  • Write something that works
  • Make your code environment reproducible
    • Requirements.txt, environment.yml etc
  • Write documentation
  • Write tests
    • You don’t have to but please do
  • Push to hosted version control
    • Github, gitlab, sourceforge
  • Share your work with everyone!

45 of 57

Suggestions for starting

46 of 57

Your first contribution playbook

Start small

  • Find a project
  • Look for a small thing you can improve in documenation, code, etc
  • Go through steps create local dev env
  • Push changes
  • Work through any problems that arise
  • Work through code review from developers
  • Get the merge notification
  • Throw a party

47 of 57

Live Demo

(Or questions if demo fails)

48 of 57

Appendix

49 of 57

Finding a match

50 of 57

What’s important when finding a project

  • You
  • The project
  • The community

51 of 57

You

  • What do you want to get out of this?
  • What do you want to learn?
  • What do you value in a community?
  • How much time do you have?
  • What can you contribute?
  • How much patience and willingness to learn do you have?

52 of 57

Project

When picking a project pay attention to

  • Are you interested in what it is?
    • Language
    • Use Case
  • Is the project actively maintained?
  • Is it well maintained or is it “messy”?
    • Both present different opportunities
  • What are the needs?
    • More code?
    • Documentation development
    • Infrastructure development

53 of 57

Community

When picking a community pay attention to

  • Do the people fit your cultural values?
    • Look for code of conduct
    • Read through PRs and issue tickets

54 of 57

Why I picked my libraries

  • Was using PyMC3 at work
  • I wanted to learn Bayesian statistics more deeply
  • All tools were python based
  • An active community that spanned academia and industry
  • Lots of tutorials and documentation
  • Everyone was, and has been really nice
  • NumFocus and PyData affiliation are great

55 of 57

56 of 57

57 of 57

Uneedited Recording of talk