Creating an open soil profile data collection for Canada

This document serves to develop and share the approach used by the Canadian Digital Soil Data Consortium (CDSDC) to create and archive a soil profile data collection for Canada.  

Feel free to add or edit content. 

OBJECTIVES

The purpose of the proposed soil profile component of the CDSDC is to develop and maintain a single, easily accessible platform for contributing, collating, harmonizing, archiving, and providing open access to geo-referenced field soil profile descriptions and accompanying laboratory analytical data for Canada.

RATIONALE

Many organizations and companies collect soil profile information on an ongoing basis in Canada, as part of environmental impact assessments, agricultural analyses, and engineering studies.  However, there is currently no active centralized platform or organization to collate or host these diverse data sets. If this  information could be gathered, harmonized and shared in some kind of open data repository it could be used beneficially to provide useful background information for future studies as well as to provide support for a host of other applications.

The rationale for creating a soil profile data collection for Canada can be summed up in two main points. The first is that an open, shared collection of soil profile data can represent an enormous public good. The second is that sharing soil profile data widely and openly could act to serve the self interest of companies and agencies that collect, produce or use soil or landscape information by reducing their costs and improving the quality and efficiency of their information products.  

From a public good perspective, it is wasteful and counterproductive to not be able to reuse the many thousands of field soil profile observations or analysed soil sample data collected by diverse agencies and companies. Presently, most field observations and associated analytical data are gathered to satisfy regulatory requirements for development or extraction projects.  These data are tabulated and presented in reports and appendices, and normally end up as PDF or paper files stored in some file cabinet or inaccessible server. The data are not easily discovered or accessed and so are rarely ever reused to benefit subsequent activities.

From the perspective of the private sector companies there are enormous potential benefits from being able to reduce, reuse and recycle field observation and laboratory data. If previously collected soil data were easy to discover and obtain, companies could reduce the amount of unnecessary or redundant data collection. Sampling plans for collecting new data could be created that take into account already existing data to maximize the impact and the efficiency of any new data collection. Existing data could be reused and recycled by incorporating it into both the initial planning for new surveys and into the production of any new maps or spatial products, enabling companies to produce better products at lower cost.  Regulatory agencies could also make use of existing data to evaluate the accuracy and reliability of any new products submitted to them.

DEVELOPMENT STRATEGY

Given that existing soil profile data is found in a wide variety of locations and forms, it seems appropriate to adopt a staged approach in order to facilitate the process of consolidating, standardizing, and optimizing access to soil profile information. Three distinct levels of data development are envisaged:

Level 1

Existing soil profile data, in its current form, regardless of content, structure, format or availability.  This data includes content found in printed form, as well as spreadsheets and databases.  Registration of this data will make it possible for users to find information, even if using it will be a chore.

Level 2

Data from Level 1 converted into a standardized structure and format that is designed to document, store, and exchange this information.  New data entering the system should be in this format, so it must be useable on a wide variety of platforms.  Data in this form will be easy to find, access and interpret, although it may not be optimal to support end user applications.

Level 3

Data from Level 2 converted into customized data structures and formats designed to support end-user application.

CDSDC should build and host registries for Level 1 and Level 2 data, provide facilities to host Level 2 data, and create documentation associated with converting data from Level 1 to Levels 2 and 3.

ACTIONS

  1. Identify a group of individuals with a mandate to develop, maintain and serve the SDI needed for a Canadian soil profile data system;
  2. Draft copyright, privacy and access protocols for the dataset,
  3. Identify potential soil data providers via different avenues some of which are listed below
  1. Initial self-selected participants contribute to creating an initial list of data collectors and data holders known to them personally.
  2. Identify, then contact, the main governmental regulators in each province and ask them which companies submit the majority of data relating to soils and soil profiles to them for regulatory purposes. Then approach both the government agencies and the companies that submit data to them to determine if they are interested in contributing to a shared data archive. (and provide good arguments for why this would potentially benefit them)
  3. Make a list of soil or environmental consulting companies operating in Canada that collect or process soil observations and samples (e.g. Golder, Stantec, Millennium), and just approach them directly to ask if they would be willing to join in an effort to collate and share profile data
  4. Simply identify and target the main soil analytical labs in Canada and ask them if they would provide the names and contacts for their largest volume clients. Then contact the clients to see if they would be willing to join a data-sharing exercise.
  1. Collate metadata on the holdings of entities identified above that collect or host soil profile data. specifically:
  1. contact person or agency
  2. numbers of profiles or sampled sites in each holding or source
  3. geo-positional accuracy of sampled locations
  4. standards used for observations and measurements
  5. willingness and capacity to share
  6. conditions for participation or sharing,
  7. copyright, privacy and access protocols,
  1. For contributions to be retained at the original data provider:
  1. Set up data discovery mechanism to allow visitors to the portal to find relevant records at their original source.
  1. For contributions to be retained in a central database
  1. Initially, store such datasets as-is in the central database with appropriate metadata including URLs to the originating institutions or sources (if applicable).
  2. Screen submitted data sets for possible inconsistencies using simple checks (basic quality control of original values and expressions). This is not yet harmonization
  3. Compile under a common standard, using agreed data exchange standards and encoding of data, with initial focus on a limited set of key soil attributes. Includes standardization of measurement units and automated quality control to flag unlikely values. Contributors may have their own quality control systems that can be applied before entry of data into the central database system.
  4. Develop and apply procedures for the harmonisation of soil analytical method descriptions (initially for the above list), followed by full quality control of in-pedon value consistency (e.g., there should be no carbonates in strongly acid soils).
  5. Correlate soil classification to the most recent CSSC and to Soil Taxonomy and WRB 2014 for those profiles that have already been classified according to an earlier well-known classification scheme. This is best done by original data providers prior to submission to the central database. If this is not possible existing conversion rules may be used.
  6. Release the first version of harmonised soil profiles, based on the present set of ‘shared’ datasets being processed in the central repository.
  7. Develop templates (data formats) for each significant data source/provider (with at least 500-1000 profiles) so that a selection of the original data may be harmonised and updated online. Requires development of uniform data exchange standards. Smaller providers can either develop their own templates according to specifications or adjust their data to one of the existing templates.

DELIVERABLES

  1. List of institutions that hold profile data (along with an identified contact person)
  2. List of institutions willing and capable to contribute data.
  3. List of numbers, locational accuracy and quality of identified data holdings.
  4. Copyright and data use policy for each potential contributor (Canadian government open data license http://open.canada.ca/en/open-government-licence-canada or Open Data Commons PDDL http://www.opendatacommons.org/licenses/pddl/1.0/)
  5. Agreed upon codes of conduct for participation, contributions
  6. Agreed upon rules for privacy, data sharing and enforcing any restrictions on use of any contributions not designated as fully open (possibility to contribute data but not to share it in an unrestricted fashion??)
  7. A repository for scanned documents, original data files, and other not-quite-ready content where it can be accessed by any person that wants to extract and reformat the data (i.e. a Level 1 registry).
  8. An agreed design and functioning implementation for a centralized database to hold contributed soil profile and analytical data (i.e. a Level 2 data storage and exchange standard).  See http://data.soilinfo.ca/home/pedon_csv.html for a proposed data storage and exchange standard.
  9. An initial, functional centralized database of initially contributed data with a web interface and easy online access (i.e. a Level 2 registry and data storage facility).
  10. A web platform and templates for collecting additional soil profiles and accompanying analytical data from additional voluntary contributors into the future.
  11. Expand functionalities of the web portal targeting especially potential crowdsourcing contributions and citizen science


OTHER IDEAS AND TEXT THAT COULD BE INCLUDED

  1. Create a mailing list to contact people who might be interested in contributing to compiling Canadian soil profile data and a mechanism for letting new people self identify and sign up to join the mailing list.
  1. I find mailing list something a bit outdated, the problem with mailing lists is that you need to register and everything is email based,  I find nowadays and (especially for a case with the Canadian profile data) you should create a group on social network (like Peter says in LinkedIn) and use it to promote the gathering of data and contribution.
  1. A collaboration platform like a wiki where we could explain what we are trying to do, host discussions, provide explanations and documentation for procedures for contributing data and provide status updates and feedback on activities to date.
  2. A page where we can ask people to identify any Canadian data sets that they are aware of that exist and might be captured. (Just a table or list).
  3. Platforms where people could actually enter their data
  1. A simple FTP site or Drop Box where people could dump any already digital databases or scanned paper records
  1. I am not a fan of legal aspects of DropBox, We have installed owncloud at ISRIC (https://owncloud.org/) and it worked preatty well, now we have a sort of DropBox BUT only running on ISRIC servers.
  1. A flexible database where people could enter profile data according to one or more formats compatible with the majority of Canadian data
  1. This is also related to your last email, I believe that too much flexibility on WOSIS1.0 and WorldSoilProfile caused serious problems in development. I prefer what is going on soilinfo, 1st a proper organized template and structure for data gathering and later add flexibility. I also like flexibility it is just the sequence of things.
  1.  A set of tools or procedures to help people convert data from scanned PDFs (or non-compatible DBFs) into one of the accepted Canadian standard formats
  1. I am not certain of the material you have, can some OCR software do this and then outsource the final checks to a cheaper country?
  1. A platform that would display the soil profile data
  1. spatially for point data (in Google maps or via an app like SoilInfo) and
  2. in table form (as tables of profile descriptions).

Process

We propose to first setup and host an online discussion forum to solicit input from self-selected interested parties and to arrive at collective agreement regarding:

  1. Goals and objectives of the Canadian Digital Soil Data Consortium (CDSDC).
  2. Responsibilities and contributions of all self-selected participants
  3. Agreed upon protocols for:
  1. participation
  2. governance
  3. acceptable behaviours and codes of conduct
  4. data sharing,
  5. key targeted data types and data sources
  6. intellectual property rights.
  7. design, structure and content of the proposed system

Through this discussion, the initial participants in the CDSDC will hopefully clarify the following  items:

  1. What kinds of soil profile observations and analytical data will be targeted for inclusion in the proposed database (what may be excluded or not solicited)
  2. Legally or organizationally mandated geographic areas of responsibility for each potential contributor (if any such conditions apply)
  3. Agreement on who will set up, maintain and manage the central platform and web site
  4. A code of ethics for conduct, participation, privacy, data sharing, and use
  5. Agreement on intellectual property rights and modalities of data publication, including data licensing. (all data free and open or provide possibility for restrictions for some data)
  6. Technical standards for data entry and data exchange
  7. Evaluation and quality control protocols
  8. Data infrastructure concepts: data portal, ontology, thesaurus, metadata standards and editor.
  9. Harmonisation protocol(s) if any
  10. Rules and protocols for access to, and use of, data stored in the system

3 possible approaches for data collection:

  1.  Just target the main soil analytical labs in Canada and ask them if they would provide the names and contacts for their largest volume clients. Then contact the clients to see if they would be willing to share data.
  2. Make a list of soil analytical laboratories or environmental consulting companies operating in Canada that collect or process soil observations and samples (e.g. Golder, Stantec, Millenium), and just approach them directly to ask if they would be willing to join in an effort to collate and share profile data.
  3. Contact the main governmental regulators in each province and ask them which companies submit the majority of data relating to soils and soil profiles to them for regulatory purposes and then approach both the government agencies and the companies that submit data to them to suggest they contribute to a shared data archive.

These approaches would be best if conducted under the auspices of some kind of recognized agency or organization.

 

People who have already assembled soil profile data.

 

1.           Cindy Shaw

2.           Jean-Daniel Sylvain (Quebec profiles for his PhD)

3.           Alberta Agriculture (SIDMAP, CAESA SIP, Benchmark Sites Project) (see Tom Goddard or David Spiess)

4.           Chuck Bulmer (BC working with Scott Smith)

5.           Doug Aspinall (Ontario DSM work)

6.        Ben Stewart (AAFC)

International implications

The rationale for activities undertaken in Canada to identify, collate, harmonize and make available soil profile data also benefits from being considered within the larger context of the 4 main recommendations of the action plan of the Global Soil Partnership of the FAO, namely:

1. “An enduring and authoritative system for monitoring and forecasting the condition of the Earth’s soil resources should be established under the auspices of the Global Soil Partnership to meet international and regional needs.”

2. “The global soil information system [GSIS] should use soil data primarily from national and within-country systems through a collaborative network and the distributed design should include facilities for incorporating inputs from new sources of soil data and information that are evolving rapidly.”

3. “The global soil information system [GSIS] should be integrated into the much larger effort to build and maintain the Global Earth Observing System of Systems [(GEOSS)]. . . and close attention should be given to issues relating to the protection of privacy, intellectual property rights and terms of use.”

4. “Implementation of the global soil information system [GSIS] should include a training program to develop a new generation of specialists in mapping, monitoring and forecasting of soil condition, with an emphasis on countries where improved soil knowledge is essential for food security and restoration and maintenance of ecosystem services.”

The key point here is that individual countries and states are expected to contribute both existing and new soil observation data to the global system, while at the same time servicing national to local needs.  The expectation of the GSP is to source the most reliable data from the bottom-up, i.e., from the institutions directly responsible for soil survey and soil geographic databases.