Published using Google Docs
Python
Updated automatically every 5 minutes

Status Report: Python

April - September 2015

Ryan May, Sean Arms, Julien Chastang, Ward Fisher, Russ Rew

Strategic Focus Areas

We support the following goals described in Unidata Strategic Plan:

  1. Enable widespread, efficient access to geoscience data
    Python can facilitate data-proximate computations and analyses through Jupyter Notebook technology. In particular, Jupyter Notebook web servers can be co-located to the data source for analysis and visualization through web browsers. This capability in turn, reduces the amount of data that must travel across computing networks.
  2. Develop and provide open-source tools for effective use of geoscience data
    Our current and forthcoming efforts in the Python arena will facilitate analysis of geoscience data. This goal will be achieved by continuing to develop Python APIs tailored to Unidata technologies. Starting with the summer 2013 Unidata training workshop, we developed an API to facilitate data access from a THREDDS data server. This effort has been encapsulated with the new siphon project, which is an API for communicating with a THREDDS server. Moreover, Python technology coupled with HTML5 Jupyter Notebook technology has the potential to address "very large datasets" problems. In particular, a Jupyter Notebook can be theoretically co-located to the data source and accessed via a web browser thereby allowing geoscience professionals to analyze data where the data reside without having to move large amounts of information across networks. This concept fits nicely with the "Unidata in the cloud" vision. Lastly, as a general purpose programming language, Python has the capability to analyze and visualize diverse data in one environment through numerous, well-maintained open-source APIs.
  3. Provide cyberinfrastructure leadership in data discovery, access, and use
    The TDS catalog crawling capabilities found in siphon will facilitate access to data remotely served by the Unidata TDS, as well as other TDS instances around the world. The desired goal of pyCDM is to construct a geoscience focused data model in Python, based heavily on the netCDF-Java implementation of the Common Data Model (CDM). pyCDM is anticipated to provide a simple, pythonic API to the higher level functionality of the FeatureType layer of the CDM.
  4. Build, support, and advocate for the diverse geoscience community
    Based on interest from the geoscience community, Unidata, as part of its annual training workshop, hosted a three day session to explore “Python with Unidata technology”. Also, to try to help the use of NetCDF in Python, Unidata has promoted Jeff Whittaker’s NetCDF4-python project, including hosting its repository under Unidata’s GitHub account. Unidata is also fostering some community development of meteorology-specific tools under the MetPy grassroots project.

Activities Since the Last Status Report

Training Workshop

The Python with Unidata Technologies workshop had 12 attendees, and was again the most well-attended of all the training workshop. This year, we expanded the workshop to 3 days, mostly just to improve pacing of the material; this seemed to work out well. It is interesting to note how many users new to Python were in attendance. It is also interesting to note that attendance was dominated by IT staff and oceanographers; there were not many meteorologists in attendance.

In conjunction with the workshop we also developed a Unidata Python Docker image. It contains a minimal conda distribution along with packages related to Unidata technology and Python.

Siphon

Siphon represents a rebranding of PyUDL, as we try to elevate our Python support in TDS to a higher status. We anticipate developing Siphon to ensure that it is easy as possible to download data from a TDS in Python, keeping pace with new features added on the Java side.

Progress has been made on the following:

JupyterHub

JupyterHub, part of project Jupyter, is a multi-user Jupyter Notebook server, with a highly-pluggable design. In support of several cloud efforts (NOAA big data, server-side processing), Ryan May has developed a set of docker images that support running a Unidata JupyterHub instance running on Amazon EC2. Authentication of users is managed using GitHub (against a simple whitelist of allowed users). Users are sandboxed from each other (and the master system) through Docker, which allows spawning individual containers on a per-user basis. Facilities provided through the Jupyter Notebook interface include: uploading files (both notebooks and data), terminal access (for installing packages, including using git), and of course execution of Python 2 and 3 code (or potentially other kernels). The interface also works on tablets, giving a nice solution for doing Python analysis through a tablet.

We would like to start extending the testing of this server outside Unidata, to see how this capability solves issues of working with large remote datasets, as well as providing managed Python environments.

Progress has been made on the following:

Dependencies, challenges, problems, and risks include:

MetPy

After feedback from the last users’ committee meeting, a push was made to bring MetPy forward as a place for community collaboration on meteorology tools that fit within the rest of the scientific Python ecosystem (aka. PyGempak). This project was announced for collaboration in late May with a blog post, and followed up with a presentation at the triennial workshop in June. Feedback has been quite positive, even beyond those who have participated on GitHub.

A presentation for 2016 AMS Python symposium has been submitted, which will hopefully do more to drive event further community interest in this project. Ideas for further development are outlined on GitHub.

Progress has been made on the following:

Dependencies, challenges, problems, and risks include:

External Participation

The Python team attends conferences as well as participates in other projects within the scientific Python ecosystem. This allows us to stay informed and to be able to advocate for our community, as well as keep our community updated on developments. Ryan May attended the 2015 SciPy conference in Austin; major takeaways:

Ryan May has also continued to be an active participant in the matplotlib community, reviewing some pull requests and contributing several others. We also continue to host Jeff Whittaker’s netCDF4-python project repository; Jeff continues to be the active maintainer of the project.

Progress has been made on the following:

Dependencies, challenges, problems, and risks include:

Ongoing Activities

We plan to continue the following activities:

New Activities

Over the next three months, we plan to organize or take part in the following:

Beyond a one-year timeframe, we plan to organize or take part in the following:

Areas for Committee Feedback

We are requesting your feedback on the following topics:

  1. What are the biggest obstacles that you see to the use of Python with other Unidata technologies, or for use in meteorology in general?
  2. How valuable do find an effort like MetPy to the Python meteorology community? Are there additional barriers we could remove through this project? Are there other efforts over which this should take priority?

Relevant Metrics

Siphon (since April):

MetPy (since April):


Prepared  September 2015