1 of 14

AI2ES Coding Standards

Group Lead: David John Gagne

December 16, 2020

2 of 14

Introduction

  • AI2ES members will be developing software libraries collaboratively across multiple groups and institutions
  • Ideally we want to share software across the institute and release packages to the public
  • Coding standards will enable us to encourage/enforce a level of software quality across all shared repositories
  • However, some standards are easier to implement and enforce than others
  • Goals:
    • Discuss potential types of standards we should aim to encourage across the institute
    • What is necessary vs. nice to have vs. overly burdensome?

3 of 14

Python Packaging Structure

  • package-name/
    • README.md: Contains description of package, installation and use instructions
    • setup.py: Script for installing the package
    • LICENSE: License text (CC0)
    • environment.yml: contains list of dependencies for the conda installer
    • requirements.txt: list of packages for pip installer
    • package/: Directory containing all python module files (.py)
      • test/: Contains all unit test files
    • doc/: Documentation directory
    • scripts/: Contains helper scripts and executable programs
    • notebooks/: Contains jupyter notebooks

Source: Amazon

4 of 14

Coding Style

  • Goal: ensure that all code follows the same formatting conventions for a consistent look and meaning across packages
  • Python style guide: PEP8
  • Major style areas
    • Variable naming convention: instance_or_function, ClassName
    • Equation spacing: c = a + b not c=a+b
    • Whitespace: indents are 4 spaces
  • Style can be checked and corrected with programs like PyCharm or with linter programs

5 of 14

Version Control

  • Version control: software that keeps track of changes to files and merges changed files together
  • Git: distributed version control software
  • Github: website that stores git repositories in a central location and provides project management and organization tools
  • Why use version control:
    • Keep track of changes in case you make a mistake and need to recover old code
    • Synchronize changes across multiple computers (edit code on laptop and sync with supercomputer)
    • Merge changes from different collaborators
    • Work on new ideas in different branches of same repository

https://xkcd.com/1597/

6 of 14

Testing

  • Code should be tested to ensure it works properly and to catch changes the break existing functionality
  • Types of tests
    • Unit tests: check functionality of single component
    • Integration tests: ensures components work together properly
  • Testing framework: pytest
  • Challenges
    • Writing good tests can be challenging
    • Needing data to test ML/data loading
    • Tests can’t cover all ways things go wrong

7 of 14

Code Review

  • Different collaborators should work on different branches while implementing new features
  • When ready to share new code with everyone else, the developer should request a code review by the leads for that package
  • Code review should accomplish the following items:
    • Verify the code works
    • Check the style
    • Identify areas of confusion or unclear functionality
    • Identify potential performance bottlenecks
  • Art of Giving and Receiving Code Reviews Gracefully: https://www.alexandra-hill.com/2018/06/25/the-art-of-giving-and-receiving-code-reviews/

8 of 14

Pull Requests

9 of 14

Continuous Integration

  • Automated scripts that run whenever new changes are pushed to github
  • Functions
    • Install all dependencies from scratch
    • Run test suite
    • Run test function
    • Check style, test coverage
    • Upload to package repository if everything passes
  • Frameworks
    • TravisCI
    • CircleCI
    • Github Actions

  • Benefits
    • Automatically runs after commits and pull requests
    • Can test multiple configurations of package
    • Catches breaking changes throughout pipeline
    • Emails you if something is broken
  • Drawbacks
    • Requires moderate effort for initial setup
    • Only as good as the tests are
    • Can cost money for private repos or if usage quota exceeded

10 of 14

Documentation

  • Code should be documented so people know how to use it properly and how it works
  • Levels of documentation
    • Docstrings: at beginning of function that describe purpose of function, inputs, outputs, and a usage example
    • Inline comments: Describe how a section of code works or why it is used
    • Tutorial: Describes how to use package through a step-by-step guide
    • Narrative documentation: describes motivation for code, history, science, broader context

11 of 14

API vs. Narrative Documentation

12 of 14

Jupyter Notebooks

  • Interactive coding and visualization interface
  • Benefits
    • Load data and interact with it on multiple computing platforms
    • Merge docs and code together
    • Great for tutorials
    • Can run locally, HPC, cloud
    • Can convert notebooks to packages (nbdev)
  • Drawbacks
    • Can encourage spaghetti code
    • Errors caused by order of running cells
    • Need Python environment setup correctly to work

13 of 14

Challenges

  • Participation in code review
    • Less experienced people can be intimidated from commenting on pull requests
    • Need to encourage comments and have a positive environment for commenting
    • Avoid gatekeeping behavior or overly harsh criticism
  • Documentation
    • Always needed but can be tedious to write
    • Need feedback on documentation priorities
    • Documentation needs can be quickly evident by having a beginner try to use the code
  • Teaching Coding Standards
    • Can point everyone to tutorials
    • People need to get in habit of practicing tasks
  • Changing Standards
    • Software recommendation and fashions change with time
    • Balancing consistent guidance with adapting to new effective practices

14 of 14

Summary and Questions

  • More detailed documents in AI2ES coding standards working group folder
  • Add practices to this document
  • Questions:
    • What other coding practices should we use?
    • What is essential, nice to have, or overly burdensome?
    • How to incorporate science workflow and priorities into coding process?
    • Who wants to join the group?
  • Email me: dgagne@ucar.edu