DataONE Federated Security Workshop Report
September 2010
Table of Contents
Introduction
Workshop Objectives
DataONE Overview
Problem Statement
Member Node Perspectives
Authentication Technology Choices
Phased Implementation
Next Steps
Open Issues
Introduction
On September 8-9, 2010, DataONE participants and collaborators gathered in Chicago to address federated security standards, management, and implementation in relation to the DataONE project. This report summarizes the outcomes of that workshop.
The workshop participants were Jon Auman, Jim Basney, Ed Bishop, Randy Butler, John Cobb, Tim DiLauro, Dale, Hendrickson, Jeff Horsburgh, Matt Jones, David Kennedy, Ken Klingenstein, Kevin Murphy, Tom Sohre, and Dave Vieglais.
Workshop Objectives
DataONE Overview
DataONE and the Data Conservatory are the two DataNet program awards to-date, funded by the National Science Foundation (NSF) Office of CyberInfrastructure (OCI) and other NSF directorates for an initial five years with plans for ongoing funding of operations. There is no news yet about future DataNet program awards. There is strong motivation for interoperability between DataNets, with identity management being a strong potential focus area for interoperability.
DataONE’s goal is to enable synthesis in earth observation sciences, providing a reliable, stable, and adaptive cyberinfrastructure. DataONE consists of coordinating nodes (CNs) and member nodes (MNs). Coordinating nodes provide cataloging services and are responsible for moving information between member nodes. Member nodes each typically support a specific domain science. The target number of member nodes is unknown but may grow from 6 initially to hundreds, serving tens of thousands of users.
DataONE has relationships with other NSF cyberinfrastructures, such as ESG, NEON, and OOI. The goal is not to try to collate all the data into one system but instead of enable existing and emerging data repositories to interoperate with DataONE.
DataONE is concerned with science data, science metadata, and system metadata. The DataONE system metadata captures dates, types, size, sources, replication attributes, associations, owner(s), and access rules. It associates the science metadata with the science data. DataONE requires a simple process for creating system metadata at member nodes.
DataONE aims to support multi-master replication of science metadata and system metadata across the coordinating nodes. Metadata is replicated across the coordinating nodes. To replicate science data, a CN asks a MN to retrieve a copy of the data from the source MN. Motivations for replication include archive/preservation/survivability and improved access (performance, accessibility, load balancing).
DataONE is nearing the end of a prototyping stage and moving to an evaluation phase. The version 1 public infrastructure will be released and stood up in the next six to 12 months. The coordinating nodes for this stage are at Oak Ridge, New Mexico, and Santa Barbara. Member nodes use different technologies that will be integrated via rich DataONE APIs and web interfaces.
More information about the DataONE system architecture and use cases is available at http://mule1.dataone.org/ArchitectureDocs.
Problem Statement
The DataONE project needs both a short-term and long-term strategy for authentication and authorization. Security requirements include:
Member Node Perspectives
Workshop participants presented details on member node requirements and operations. The slides from these presentations are available at ????.
Data access requirements across the member nodes can be categorized as:
Currently all data in CUASHI and ESDIS is public. Dryad has mostly public data, with one year data embargoes in some cases. All Dryad metadata is public. LTER data is public with tracking to comply with NSF usage reporting requirements.
Two commonly used data management software packages are Metacat and DSpace.
Authentication Technology Choices
From the wide range of authentication technology choices surveyed in the Security Landscape presentation at the workshop, attendees narrowed their focus to four options considered most promising:
The Authentication Technology Matrix documents the different attributes that were considered for each of these technology choices. The following requirements were particularly critical to reaching consensus on technology choices:
As a result of considering these and other factors detailed in the Authentication Technology Matrix, the consensus recommendation is for DataONE to adopt a combination of InCommon and CILogon authentication technologies. Note that CILogon itself depends on InCommon.
CILogon today issues certificates that can be used both inside and outside web browsers, but an initial browser-based authentication via InCommon to CILogon is required. InCommon is expected to support non-browser applications in the future via Project Moonshot and related work. CILogon certificates can also be used by services/agents, using long-lived (one year) certificates and/or RFC 3820 proxy certificates.
Phased Implementation
Four implementation phases are envisioned:
Next Steps
Given the above authentication technology choices and phased implementation plan, the next steps are:
Open Issues
We conclude by identifying open issues raised during the workshop: