Status Report: Python
April 2016 - September 2016
Ryan May, Sean Arms, Julien Chastang, Ward Fisher
Activities Since the Last Status Report
Python Training Efforts
- We continue to improve and expand the Unidata Python Workshop, ensuring it stays current with the latest development in the scientific Python area. Specifically, we enhance and add Jupyter Notebooks in the geoscientific domain in addition to making sure we stay on top of infrastructure changes with the conda package manager, and the Jupyter environment.
- Unidata obtained supplemental funds from NSF to start an online Python training effort specifically focused on serving the geoscience community. This was a solid pilot effort towards creating online Python training materials, but much remains to be done before it is a complete resource.
- We have started fostering a GitHub repository for collecting useful example notebooks for using Python in the geosciences. Kristen Pozsonyi, one of Unidata’s summer interns, helped spearhead an effort to turn these notebooks into a much better online gallery. We welcome community contributions to this repository.
- Sean Arms and Ryan May travelled to Madison, WI to present a pair of two-day workshops on Python for CIMSS/SSEC and the University of Wisconsin Atmospheric and Oceanic Sciences Department. The workshop had over 50 attendees, and the feedback for the workshop was overwhelmingly positive. Lessons learned from these workshops is being used to improve the annual Python training materials.
- Thanks to the tutorial sessions at SciPy 2016, Ryan May is now a certified instructor for Software Carpentry (SWC). This led to his travelling to Texas Tech University to help give a SWC workshop sponsored in part by the TTU Atmospheric Science group. This was an excellent opportunity to teach git and introductory Python to a diverse group, which again has provided lessons on how to improve our annual Python workshop. This was also an opportunity to introduce TTU Atmospheric Science graduate students to MetPy, etc., as well as solicit contributions.
Progress has been made on the following:
- Creating introductory online Python training materials
- Continued improvement and refinement of the annual Python workshop materials--to the extent that there is less individual preparation for each workshop.
- Expansion of available training materials and in-person training offerings
- Creating a repository for gathering Jupyter notebooks created both internally and by the community as a learning resource
Dependencies, challenges, problems, and risks include:
- The amount of training has contributed to a slower pace of development on MetPy/Siphon--though this is by no means necessarily a bad thing.
MetPy
The MetPy community continues to grow slowly. There have been several externally driven Pull Requests, both for bug fixes and new features; the MetPy twitter account has also reached 105 followers. Both of Unidata’s summer interns, Alex Haberlie and Kristen Pozsonyi, spent the majority of their time working on additions to MetPy. Kristen’s time was especially well-spent; as a beginner to Python, the challenges she experienced have helped inform some areas where MetPy’s user experience needs improvement.
We also continue to improve MetPy’s open development model. Ideas for further development are captured in the GitHub issue tracker; current plans are to focus on enhancing the abilities to read point data (both BUFR and raw METAR) to facilitate generating NetCDF data for hosting on THREDDS, as well as simplifications to MetPy for working with units. The MetPy project is investigating some development changes, such as greater use of milestones for planning, a time-based release schedule, and active solicitation of feedback on these milestones in consideration of additional staff hired to work on Python, and to facilitate more community feedback in priorities.
Progress has been made on the following:
- Continued investment in MetPy’s automated build, testing, documentation, and release infrastructure
- Kristen Pozsonyi added more calculations to MetPy
- MetPy now has capabilities (still considered beta-level in terms of API) for interpolating point data to a grid using a variety of methods, such as Barnes, Cressman, and Natural Neighbor (Thanks, Alex Haberlie!)
- The MetPy examples now show up as a gallery of images thanks to Kristen Pozsonyi’s hard work
- A calculation for sounding equilibrium layer (and curve intersection algorithm) was contributed by the community
- Community awareness and involvement progressing well one year into the project
Siphon
Siphon represents our official Python support for TDS. While development has been slow of late, this is largely because its current capabilities meet current needs (versus some needs in MetPy). We anticipate developing Siphon to ensure that it is easy as possible to download data from a TDS in Python (such as crawling a server looking for data), keeping pace with new features added on the Java side.
External Participation
The Python team attends conferences as well as participates in other projects within the scientific Python ecosystem. This allows us to stay informed and to be able to advocate for our community, as well as keep our community updated on developments. Ryan May has also continued to be an active participant in the matplotlib community, reviewing some pull requests and contributing several others. We also continue to host Jeff Whittaker’s netCDF4-python project repository; Jeff continues to be the active maintainer of the project. Ryan May is also now a member of the planning committee for the Python Symposium at the AMS Annual Meeting; he will also be presenting the Core Science Keynote for Python at the annual meeting, as well as helping bring the future generation of students into fold by presenting on Python at the AMS Student Conference.
Progress has been made on the following:
- Have continued to evaluate xarray (formerly xray, created by Stephen Hoyer) as a way to get CDM-like functionality in Python. It’s current abilities provide a nice coordinate-aware data-object, as well as a way to attach attributes to arrays. This project has become a general tool for scientific Python, elevated to a top-level project within PyData.
- Participated in the conda-forge project on GitHub; this is a community project to produce automated builds of conda packages using open recipes and infrastructure. We have contributed (and maintain) recipes for MetPy and Siphon, as well as their dependencies. These packages are available from the conda-forge channel on anaconda.org
Ongoing Activities
We plan to continue the following activities:
- “Python with Unidata Technologies” training workshop
- Maintaining Siphon as an official Python API for working with TDS
- Growing and developing MetPy as a community resource for Python in meteorology
- Continued participation in the scientific Python community
- Relevant matplotlib support and fixes
- Working with JupyterHub as a way to facilitate data-proximate analysis
- Continue regular series of notebook-based blog posts on the Unidata Developer’s blog to demonstrate the use of Python for various meteorological tasks
- As resources and time permits, continue making progress on the Online Python Training project by writing Jupyter notebooks specifically targeted towards teaching the geoscience community programming concepts. We have submitted an abstract to present this project at the AMS 2017 Annual Meeting in Seattle, WA.
New Activities
Over the next three months, we plan to organize or take part in the following:
- Unidata’s Annual Training workshop on “Using Python with Unidata Technologies”
- Help hold AMS short course on accessing the NEXRAD archive in AWS at the 2017 AMS Annual Meeting
Over the next twelve months, we plan to organize or take part in the following:
- Restructure our annual Python workshop to be a full week with introduction to Python/git, intermediate with MetPy/Siphon/etc., and developer hack-day
- Attend SciPy 2017
- Python related presentations by Unidata staff at the 2017 AMS Annual Meeting in Seattle, WA.
Beyond a one-year timeframe, we plan to organize or take part in the following:
- Evaluate the possibility of extending siphon functionality to interface with the AWIPS-II EDEX server
- Offer a version of our Python workshop as an AMS short course at the Annual Meeting in 2018
Areas for Committee Feedback
We are requesting your feedback on the following topics:
- In light of the success of the Madison Python workshop, should we be offering more regional Python workshops? Is anyone willing to help sponsor?
- Does offering our training at AMS (or other conferences) seem like a worthwhile effort in order to take more advantage of the opportunity presented by a gathering of our community? Are there other conferences in addition to AMS you would suggest?
- Are there any additions you’d like to make to MetPy’s or Siphon’s development roadmap?
- What are the biggest obstacles that you see to the use of Python with other Unidata technologies, or for use in meteorology in general?
- We continue to maintain the Unidata Python Workshop with fresh, relevant, and up-to-date content. However, we would welcome feedback from our committees on topics we may not be covering in the workshop.
Relevant Metrics
Siphon
- 95% test coverage
- 791 downloads/month from the Python package index
- Watchers: 10
- Since 1 April 2016:
- Active Issues: 14 (5 created, 3 closed)
- Active PRs: 8 (7 created, 8 closed)
- External Issue Activity: 0 opened, 0 comments
- External PR Activity: 1 opened, 1 comments
- Unique external contributors: 1
- Stars: 5 (26 total)
- Commits: 31
- Active Issues: 37 (24 created, 15 closed)
- Active PRs: 30 (28 created, 30 closed)
- External Issue Activity: 1 opened, 7 comments
- External PR Activity: 1 opened, 5 comments
- Unique external contributors: 6
- Stars: 12 (26 total)
- Commits: 169
MetPy
- 94% test coverage
- 1670 downloads/month from the Python package index
- Watchers: 22
- Since 1 April 2016
- Active Issues: 63 (50 created, 22 closed)
- Active PRs: 43 (41 created, 36 closed)
- External Issue Activity: 24 opened, 34 comments
- External PR Activity: 20 opened, 20 comments
- Unique external contributors: 19
- Stars: 22 (81 total)
- Commits: 135
- Active Issues: 103 (70 created, 41 closed)
- Active PRs: 70 (69 created, 63 closed)
- External Issue Activity: 27 opened, 45 comments
- External PR Activity: 23 opened, 21 comments
- Unique external contributors: 24
- Stars: 38 (81 total)
- Commits: 288
Unidata Python Workshop
- Watchers: 24
- Since 1 April 2016
- Active Issues: 16 (14 created, 4 closed)
- Active PRs: 17 (17 created, 17 closed)
- External Issue Activity: 1 opened, 3 comments
- External PR Activity: 0 opened, 0 comments
- Unique external contributors: 1
- Stars: 15 (39 total)
- Commits: 64
- Since 1 October 2015
- Active Issues: 23 (21 created, 7 closed)
- Active PRs: 23 (23 created, 23 closed)
- External Issue Activity: 1 opened, 3 comments
- External PR Activity: 0 opened, 0 comments
- Unique external contributors: 1
- Stars: 19 (39 total)
- Commits: 98
Unidata Online Python Training
- Watchers: 4
- Since 1 April 2016
- Active Issues: 31 (21 created, 26 closed)
- Active PRs: 44 (44 created, 44 closed)
- External Issue Activity: 0 opened, 2 comments
- External PR Activity: 6 opened, 1 comments
- Unique external contributors: 2
- Stars: 0 (0 total)
- Commits: 76
- Active Issues: 40 (40 created, 26 closed)
- Active PRs: 49 (49 created, 49 closed)
- External Issue Activity: 0 opened, 2 comments
- External PR Activity: 6 opened, 1 comments
- Unique external contributors: 2
- Stars: 0 (0 total)
- Commits: 89
Strategic Focus Areas
We support the following goals described in Unidata Strategic Plan:
- Enable widespread, efficient access to geoscience data
Python can facilitate data-proximate computations and analyses through Jupyter Notebook technology. Jupyter Notebook web servers can be co-located to the data source for analysis and visualization through web browsers. This capability, in turn, reduces the amount of data that must travel across computing networks.
- Develop and provide open-source tools for effective use of geoscience data
Our current and forthcoming efforts in the Python arena will facilitate analysis of geoscience data. This goal will be achieved by continuing to develop Python APIs tailored to Unidata technologies. Starting with the summer 2013 Unidata training workshop, we developed an API to facilitate data access from a THREDDS data server. This effort has been encapsulated with the new siphon project, which is an API for communicating with a THREDDS server. Moreover, Python technology coupled with the HTML5 Jupyter Notebook technology has the potential to address "very large datasets" problems. Jupyter Notebooks can be co-located to the data source and accessed via a web browser thereby allowing geoscience professionals to analyze data where the data reside without having to move large amounts of information across networks. This concept fits nicely with the "Unidata in the cloud" vision and the goals outlined Unidata 2018 Five-year plan. Lastly, as a general purpose programming language, Python has the capability to analyze and visualize diverse data in one environment through numerous, well-maintained open-source APIs. The additional development of MetPy fills the need for domain-specific analysis and visualization tools in Python.
- Provide cyberinfrastructure leadership in data discovery, access, and use
The TDS catalog crawling capabilities found in siphon will facilitate access to data remotely served by the Unidata TDS, as well as other TDS instances around the world.
- Build, support, and advocate for the diverse geoscience community
Based on interest from the geoscience community, Unidata, as part of its annual training workshop, now hosts a three day session to explore Python with Unidata technology. Also, to advance the use of NetCDF in Python, Unidata has promoted Jeff Whitaker’s NetCDF4-python project, including hosting its repository under Unidata’s GitHub account. Unidata is initiating a project to provide online Python training specifically targeting geoscience students. Unidata is also fostering some community development of meteorology-specific tools under the MetPy project.
Prepared September 2016