Status Report: Python
September 2015 - March 2016
Ryan May, Sean Arms, Julien Chastang, Ward Fisher
Activities Since the Last Status Report
Siphon
Siphon represents a rebranding of PyUDL, as we try to elevate our Python support in TDS to a higher status. We anticipate developing Siphon to ensure that it is easy as possible to download data from a TDS in Python, keeping pace with new features added on the Java side.
Progress has been made on the following:
- Expanded support for netCDF extended model through CDMRemote
- Support for CDMRemote v2 from TDS 5
- Community contribution of fixing Windows-only bug in NCSS handler
- Improved automated code quality checking with Codacy and QuantifiedCode
- Improved infrastructure of testing (including porting from nose to py.test)
- Changes to ensure automated documentation builds do not silently fail
- Addition of automated testing on Windows using AppVeyor
MetPy
Feedback on MetPy continues to be positive. We have been publicizing the project in a variety of ways:
- Ryan May presented on MetPy at the Python Symposium at the 2016 AMS Annual Meeting.
- The Unidata Developers’ blog had a two part post using Siphon and MetPy to plot GINI satellite data
- The MetPy twitter account has reached 41 followers
- Ryan May and Sean Arms (with input from Kevin Goebbert) will present (presented) a tutorial on MetPy and Siphon at the Software Engineering Assembly conference.
There have been some external contributions from the community, including a Pull Request adding a function to reduce sounding data to a desired set of data. Ideas for further development are captured in the GitHub issue tracker; current plans are to focus on enhancing the abilities to read point data (to facilitate generating NetCDF data for hosting on THREDDS) and to objectively analyze such data.
Progress has been made on the following:
- Continued investment in automating build, testing, documentation, and release infrastructure using Travis, Codacy, QuantifiedCode, and Read The Docs
- Added image-based tests for automated testing of plotting functionality
- Added support for reading GINI satellite imagery, including a netcdf-like interface
- Implemented station plots including weather symbols
- Improvements to Skew-T plots and added hodographs
- Community awareness and involvement progressing well one year into the project
Dependencies, challenges, problems, and risks include:
- Due to small dedicated staff time, progress adding to MetPy has been slower than desired.
External Participation
The Python team attends conferences as well as participates in other projects within the scientific Python ecosystem. This allows us to stay informed and to be able to advocate for our community, as well as keep our community updated on developments. Ryan May has also continued to be an active participant in the matplotlib community, reviewing some pull requests and contributing several others. We also continue to host Jeff Whittaker’s netCDF4-python project repository; Jeff continues to be the active maintainer of the project.
Progress has been made on the following:
- Fixed wind barbs for the upcoming 2.0 release of matplotlib
- Have continued to evaluate xarray (formerly xray, created by Stephen Hoyer) as a way to get CDM-like functionality in Python. It’s current abilities provide a nice coordinate-aware data-object, as well as a way to attach attributes to arrays. This project has become a general tool for scientific Python, elevated to a top-level project within PyData.
- Participated in the conda-forge project on GitHub; this is a community project to produce automated builds of conda packages using open recipes and infrastructure. We have contributed (and maintain) recipes for MetPy and Siphon, as well as their dependencies. These packages are available from the conda-forge channel on anaconda.org
Unidata Python Workshop
We continue to improve and expand this popular workshop, now three days long, with new material, ensuring it stays current with the latest development in the scientific Python area. Specifically, we enhance and add Jupyter Notebooks in the geoscientific domain in addition to making sure we stay on top of infrastructure changes with the conda package manager, and the Jupyter environment.
Python Online Training Effort
Unidata obtained supplemental funds from NSF as part of our five-year award to start an online Python training effort specifically focused on serving the geoscience community. The Unidata Python group has outlined sections for introductory material, and is currently working towards having a draft resource in the early summer.
Ongoing Activities
We plan to continue the following activities:
- “Python with Unidata Technologies” training workshop
- Maintaining Siphon as an official Python API for working with TDS
- Growing and developing MetPy as a community resource for Python in meteorology
- Continued participation in the scientific Python community
- Relevant matplotlib support and fixes
- Working with JupyterHub as a way to facilitate data-proximate analysis
- Continue regular series of notebook-based blog posts on the Unidata Developer’s blog to demonstrate the use of Python for various meteorological tasks
New Activities
Over the next three months, we plan to organize or take part in the following:
- Using supplemental funds from NSF, develop asynchronous training materials for Python in meteorology. We are investigating the use of a cloud server hosting executable Jupyter Notebooks (based on our training workshop) as the core of the training materials, using either the tmpnb or jupyterhub packages from Project Jupyter.
- CIMSS/SSEC has asked Unidata to present our Python Training Workshop in a pair of two-day workshops in Madison, WI in June.
- Investigate the use of having THREDDS communicate directly with a Jupyter/IPython kernel for server-side processing functionality
Beyond a one-year timeframe, we plan to organize or take part in the following:
- Evaluate the possibility of extending siphon functionality to interface with the AWIPS-II EDEX server
Areas for Committee Feedback
We are requesting your feedback on the following topics:
- What are the biggest obstacles that you see to the use of Python with other Unidata technologies, or for use in meteorology in general?
- How valuable do find an effort like MetPy to the Python meteorology community? Are there additional barriers we could remove through this project? Are there other efforts over which this should take priority?
- Have you seen the two-part blog post using MetPy and Siphon? How was the tone for a “notebook of the week”-type series?
- We continue to maintain the Unidata Python Workshop with fresh, relevant, and up-to-date content. However, we would welcome feedback from our committees on topics we may not be covering in the workshop.
Relevant Metrics
Siphon:
- 95% test coverage
- 357 downloads/month from the Python package index
- Watchers: 10
- Since 1 October 2015:
- Active Issues: 31 (19 created, 12 closed)
- Active PRs: 23 (21 created, 22 closed)
- External Issue Activity: 1 opened, 7 comments
- External PR Activity: 0 opened, 4 comments
- Unique external contributors: 5
- Stars: 6 (20 total)
- Commits: 138
- Active Issues: 55 (55 created, 31 closed)
- Active PRs: 41 (41 created, 40 closed)
- External Issue Activity: 6 opened, 16 comments
- External PR Activity: 4 opened, 8 comments
- Unique external contributors: 11
- Stars: 14 (20 total)
- Commits: 250
MetPy:
- 93% test coverage
- 798 downloads/month from the Python package index
- Watchers: 11
- Since 1 October 2015
- Active Issues: 53 (20 created, 19 closed)
- Active PRs: 29 (28 created, 27 closed)
- External Issue Activity: 3 opened, 11 comments
- External PR Activity: 3 opened, 1 comments
- Unique external contributors: 8
- Stars: 14 (58 total)
- Commits: 153
- Active Issues: 77 (75 created, 41 closed)
- Active PRs: 58 (58 created, 56 closed)
- External Issue Activity: 11 opened, 35 comments
- External PR Activity: 6 opened, 2 comments
- Unique external contributors: 15
- Stars: 53 (58 total)
- Commits: 478
Unidata Python Workshop (from github):
- Watchers: 28
- Stars: 24
- Forks: 49
- Issues: 45 (40 created, 5 closed)
- Pull Requests: 19 (0 open, 19 closed)
- Master Branch Commits: 545
Strategic Focus Areas
We support the following goals described in Unidata Strategic Plan:
- Enable widespread, efficient access to geoscience data
Python can facilitate data-proximate computations and analyses through Jupyter Notebook technology. Jupyter Notebook web servers can be co-located to the data source for analysis and visualization through web browsers. This capability, in turn, reduces the amount of data that must travel across computing networks. - Develop and provide open-source tools for effective use of geoscience data
Our current and forthcoming efforts in the Python arena will facilitate analysis of geoscience data. This goal will be achieved by continuing to develop Python APIs tailored to Unidata technologies. Starting with the summer 2013 Unidata training workshop, we developed an API to facilitate data access from a THREDDS data server. This effort has been encapsulated with the new siphon project, which is an API for communicating with a THREDDS server. Moreover, Python technology coupled with the HTML5 Jupyter Notebook technology has the potential to address "very large datasets" problems. Jupyter Notebooks can be co-located to the data source and accessed via a web browser thereby allowing geoscience professionals to analyze data where the data reside without having to move large amounts of information across networks. This concept fits nicely with the "Unidata in the cloud" vision and the goals outlined Unidata 2018 Five-year plan. Lastly, as a general purpose programming language, Python has the capability to analyze and visualize diverse data in one environment through numerous, well-maintained open-source APIs. The additional development of MetPy fills the need for domain-specific analysis and visualization tools in Python.
- Provide cyberinfrastructure leadership in data discovery, access, and use
The TDS catalog crawling capabilities found in siphon will facilitate access to data remotely served by the Unidata TDS, as well as other TDS instances around the world.
- Build, support, and advocate for the diverse geoscience community
Based on interest from the geoscience community, Unidata, as part of its annual training workshop, now hosts a three day session to explore Python with Unidata technology. Also, to advance the use of NetCDF in Python, Unidata has promoted Jeff Whitaker’s NetCDF4-python project, including hosting its repository under Unidata’s GitHub account. Unidata is also fostering some community development of meteorology-specific tools under the MetPy grassroots project.
Prepared March 2016