Status Report: Python
September 2018 - March 2019
Ryan May, John Leeman, Sean Arms, Julien Chastang, Michael James
Areas for Committee Feedback
We are requesting your feedback on the following topics:
- How can Unidata’s Python training better serve your needs or the needs of your students? Are there other topics we need to add to the workshop? Are there additional opportunities (e.g. conferences) we should explore as convenient venues?
- What are the most useful functionalities in MetPy and Siphon for your needs? What do we do well?
- Are there any additions you’d like to make to MetPy’s roadmap? Anything you notice as lacking in MetPy or Siphon?
Activities Since the Last Status Report
Staffing Changes
John Leeman will unfortunately be stepping down from Unidata in May 2019. A search for a new Python Developer, with an emphasis on instructional skills and experience, has begun. We hope to have the search completed by the end of Spring 2019. Additionally, Ryan May has assumed more responsibilities by joining Unidata’s Management Team, which is requiring about 25% of his time. These changes and supporting activities have impacted the bandwidth of the Python development team. This should not have lasting impacts on our capacity, pending successful completion of the hire.
Python Training Efforts
Python training efforts continue to be a valuable portion of the Python portfolio. We continue to be successful in identifying opportunities to offer training within our resource constraints. Not only do these generate significant goodwill and grow our audience, but they are a significant source of information to inform our library development. One challenge is to balance time dedicated to creation of training materials, workshop preparation, and logistics against time devoted to support and Python software development. We have also begun looking at ways to unify our various training efforts (gallery, workshop, and online python training) into a proper “Python Training Portal”. The hope would be to make it simpler to have community contributions and to turn this into a better resource for the community’s Python training needs.
Progress has been made on the following:
- Ryan May and John Leeman, together with Kevin Goebbert, taught a short course using MetPy at the 2019 AMS Annual Meeting, focused on synoptic meteorology. The course was taught to a sold out audience of 22 attendees. We plan to submit for another course for AMS 2020.
- Ryan May, Sean Arms, and Howard Van Dam taught a 1-day workshop at Metro State University in Denver in March to 23 attendees.
- Ryan May and John Leeman taught a regional workshop at Valparaiso University on 14-15 March. This workshop also included a Saturday morning “Hack Day”.
- Ryan may and John Leeman will teach a workshop on testing in Python at the NCAR-hosted Conference on Improving Scientific Software in April 2019.
- John Leeman continues to lead the “MetPy Mondays” effort. These weekly screencasts on the Unidata Developers’ blog receive a lot of attention and feedback. Creating these also often uncover improvements for our software. Unidata’s YouTube channel has had 49.6k minutes of watch time in the last year (up 108% from the previous year), 17.4k views (up 110%), 242 new subscribers (up 105%), and the most popular video has received over 2000 views. We welcome additional community screencasts and suggested topics.
MetPy
MetPy continues to grow, both in features and in community. The volume of support requests is the most remarkable area of growth; traffic and activity across GitHub, E-Support/E-mail, and Stack Overflow is steady. Community code contributions have been somewhat slower; this may be due in part to less activate solicitation of contributions, owing to the uptick in support requests and aforementioned staffing changes.
Development going forward will continue to be driven by requirements for our dedicated awards (in addition to bug reports and pull requests from community members). The primary efforts will be focused around the GEMPAK-like interface, improved units support, integration with xarray, and data formats. We do anticipate the release of MetPy 1.0 this year. Also, to try to foster more community discussion of MetPy’s goals and plans, we have published a general MetPy roadmap that tries to capture our plans from GitHub in a more friendly format.
Progress has been made on the following:
- Community awareness continues to grow, with the volume of engagement and mentions on social media growing; the MetPy twitter account has reached 749 followers (29% growth in 6 months).
- MetPy 0.9.2 was released with a few minor bug fixes
- MetPy 0.10 was released, including the initial GEMPAK-like plotting functionality, more xarray integration, and various calculation enhancements.
- Work towards requirements of MetPy-related NSF awards
Siphon and Data Processing
Siphon continues to grow and develop, though at a slower pace than MetPy; its development tends to be driven by obstacles to access of remote data. The most pressing developments we anticipate for Siphon are improvements to working with Siphon in interactive sessions, like the Jupyter notebook environment: improved catalog crawling interface, better string representations, and tab completion. Siphon continues to see community contributions trickle in. We hope to have one of Unidata’s summer interns do some contributions to Siphon as part of their summer work.
We also continue to maintain the LDM Alchemy repository as a collection of LDM processing scripts in Python. Currently this includes the code powering the AWS NEXRAD archive as well as the program that reconstitutes NOAAPORT GOES-16/17 imagery. As we transition more of our internal data processing to Python, this repository will hold those scripts. We have seen several community questions regarding both the GOES and NEXRAD processing software.
External Participation
The Python team attends conferences as well as participates in other projects within the scientific Python ecosystem. This allows us to stay informed and to be able to advocate for our community, as well as keep our community updated on developments. As participants in a broader Open Source software ecosystem, the Python team regularly encounters issues in other projects relevant to our community’s needs. As such, we routinely engage these projects to address challenges and submit fixes. We also continue to host Jeff Whittaker’s netCDF4-python project repository; Jeff continues to be the active maintainer of the project. The overall involvement helps ensure that important portions of our community’s Python stack remain well-supported. Ryan May continues to serve as a core developer for CartoPy as well as a member of Matplotlib’s Steering Council.
Progress has been made on the following:
- Ryan May was invited to again present on MetPy at the “Workshop on developing Python frameworks for earth system sciences” hosted by ECMWF in October 2018.
- We continue to engage with the Pangeo project, a grass-roots effort to develop a community stack of tools serving the atmospheric, oceanic, land, and climate science. This engagement is enhanced by work on the Pangeo EarthCube award, which will likely drive some contributions to the XArray project
- Ryan May served as the release manager for the CartoPy 0.17 release
- We also continue to actively engage with the xarray, numpy, and pint projects
Python for AWIPS
We continue to update the Python Data Access Framework (python-awips) package with the latest changes from the AWIPS baseline. This package is used in both AWIPS and GEMPAK for remote retrieval of AWIPS data (grids, geometries, and imagery), as well as independently in Jupyter Notebooks.
Changes to python-awips since the last report include (through release 18.1.7):
- New functions DataAccessLayer.getMetarObs() and DataAccessLayer.getSynopticObs() added to process retrieved surface parameters into a dictionary.
- Added a new class awips.dataaccess.ModelSounding() to request vertical soundings from any available AWIPS model with isobaric data levels.
- New methods added to DataAccessLayer called getRadarProductNames() and getRadarProductIDs() to return either names or numerical IDs from the list of products available for the radar datatype.
- Added GEMPAK-specific scripts for processing data from EDEX to GEMPAK/NMAP2 display.
- Better control for Python 3 bytestring encoding Python 2 unicode.
- New Jupyter Notebooks using python-awips:
Ongoing Activities
We plan to continue the following activities:
- Unidata Python training workshop
- Growing Siphon as a tool for remote data access across a variety of services
- Growing and developing MetPy as a community resource for Python in meteorology
- Continued participation in the scientific Python community as advocates for the atmospheric science community
- Working with JupyterHub as a way to facilitate data-proximate analysis
- MetPy Mondays for engaging the community
- As resources and time permit, continue growing the Online Python Training project by writing Jupyter notebooks specifically targeted towards teaching the geoscience community programming concepts.
New Activities
Over the next three months, we plan to organize or take part in the following:
- Teach Python workshop for Unidata/UCAR/NCAR interns
- Teach additional Python regional workshops
- Attend SciPy 2019
Over the next twelve months, we plan to organize or take part in the following:
- Teach another short course on MetPy at AMS 2020
- Present annual update on Python libraries at AMS 2020
Beyond a one-year timeframe, we plan to organize or take part in the following:
- Evaluate the possibility of extending siphon functionality to interface with the AWIPS-II EDEX server
- Restructure our annual Python training materials into a more unified Python training portal
Relevant Metrics
MetPy
- 98% test coverage
- Watchers: 51
- Downloads for the releases made in the last year (only Conda for now):
- 0.8.0: 5314
- 0.9.0: Not released
- 0.9.1: 3216
- 0.9.2: 8405
- 0.10.0: 2661
- Active Issues: 74 (43 created, 21 closed)
- Active PRs: 38 (33 created, 30 closed)
- External Issue Activity: 24 opened, 50 comments
- External PR Activity: 14 opened, 9 comments
- Unique external contributors: 31
- Stars: 76 (380 total)
- Forks: 0 (133 total)
- Commits: 93
- Active Issues: 202 (126 created, 100 closed)
- Active PRs: 149 (126 created, 121 closed)
- External Issue Activity: 63 opened, 194 comments
- External PR Activity: 35 opened, 69 comments
- Unique external contributors: 62
- Stars: 145 (380 total)
- Forks: 3 (156 total)
- Commits: 384
Siphon
- 97% test coverage
- Watchers: 14
- Downloads for the last year (only Conda for now):
- Active Issues: 17 (14 created, 1 closed)
- Active PRs: 7 (5 created, 5 closed)
- External Issue Activity: 7 opened, 13 comments
- External PR Activity: 2 opened, 0 comments
- Unique external contributors: 8
- Stars: 11 (90 total)
- Forks: 1 (35 total)
- Commits: 12
- Active Issues: 55 (37 created, 30 closed)
- Active PRs: 39 (38 created, 37 closed)
- External Issue Activity: 16 opened, 40 comments
- External PR Activity: 7 opened, 8 comments
- Unique external contributors: 22
- Stars: 29 (90 total)
- Forks: 1 (35 total)
- Commits: 133
Python-AWIPS
- Downloads for the last month: 641
- Downloads for 2018: 3,455
- Downloads for the last 12 months: 5,387
- All-time downloads: 14,023
Strategic Focus Areas
We support the following goals described in Unidata Strategic Plan:
- Enable widespread, efficient access to geoscience data
Python can facilitate data-proximate computations and analyses through Jupyter Notebook technology. Jupyter Notebook web servers can be co-located to the data source for analysis and visualization through web browsers. This capability, in turn, reduces the amount of data that must travel across computing networks.
- Develop and provide open-source tools for effective use of geoscience data
Our current and forthcoming efforts in the Python arena will facilitate analysis of geoscience data. This goal will be achieved by continuing to develop Python APIs tailored to Unidata technologies. Starting with the summer 2013 Unidata training workshop, we developed an API to facilitate data access from a THREDDS data server. This effort has been encapsulated with the new siphon project, which is an API for accessing remote data, including the THREDDS data server. Moreover, Python technology coupled with the HTML5 Jupyter Notebook technology has the potential to address "very large datasets" problems. Jupyter Notebooks can be co-located to the data source and accessed via a web browser thereby allowing geoscience professionals to analyze data where the data reside without having to move large amounts of information across networks. This concept fits nicely with the "Unidata in the cloud" vision and the goals outlined Unidata 2018 Five-year plan. Lastly, as a general purpose programming language, Python has the capability to analyze and visualize diverse data in one environment through numerous, well-maintained open-source APIs. The additional development of MetPy fills the need for domain-specific analysis and visualization tools in Python.
- Provide cyberinfrastructure leadership in data discovery, access, and use
The TDS catalog crawling capabilities found in siphon will facilitate access to data remotely served by the Unidata TDS, as well as other TDS instances around the world.
- Build, support, and advocate for the diverse geoscience community
Based on interest from the geoscience community, Unidata, as part of its annual training workshop, now hosts a three day session to explore Python with Unidata technology. Also, to advance the use of NetCDF in Python, Unidata has promoted Jeff Whitaker’s NetCDF4-python project, including hosting its repository under Unidata’s GitHub account. Unidata is initiating a project to provide online Python training specifically targeting geoscience students. Unidata is also fostering some community development of meteorology-specific tools under the MetPy project.
Prepared March 2019