Questionnaire on data citation and persistent identifiers (4)
ENVRIplus - a Horizon2020 project bringing together environmental and earth system research infrastructures - is inviting you to participate in a survey about how your organization views the role of persistent identifiers for research data. The survey consists of 9 questions, and estimated time for answering the questions is about 10-15 minutes. Please note that you find a background of this study, and a list of definitions of terms at the end of this document.

We would appreciate if you could answer the questions in the questionnaire and return your answers by e-mail to: - by 1 February 2018 at latest.

Please note that it is not an anonymous questionnaire, as we would like you to leave your contact details (name and e-mail) in case we would like to send follow-up questions, and also to share our final report with you afterwards. Your answers will however only be kept in our files during this project, and any information included in our report will be completely anonymized before publishing. Don’t hesitate to contact us if you have any questions, see our contact details below.

See also the list of definitions below the questions. Fields marked with * are mandatory.
Your name and organization *
Your e-mail *
Please tick the box to leave your consent to the following:
Please tick the box if you wish to get a copy of our final project report:
1. The concept “persistent” in persistent identifiers *, what does that mean to you and the services in your organization?
2. What types of PID’s may you allow in your services? Mark relevant options:
3. a) What is your opinion on PID based references pointing to samples, instruments and stations in scientific articles?
3. b) Would it be feasible to support PID services to these references to non-data objects? Mark relevant option:
4. What is your opinion on peer-review of datasets to obtain a PID?
5. What is your opinion on allowing bibliographic references to be made to datasets fragments * or subsets? (i.e. by appending pointer information to the PID of the dataset)
6. What is your opinion about using PIDs for data collections * (i.e. collections of several datasets)?
7. a) What is your opinion on dynamic datasets * ?
7. b) What is your opinion on assigning PIDs to search queries * rather than assigning PIDs to the results from a query? (see List of definitions for a description of PIDs for search queries)
8. a) In relation to persistence of PIDs, what is your opinion on the “sustainability” of your products and services?
8. b) What time frame would constitute “sustainable” for your services?
9. Do you have any other comments or ideas on persistent identifiers and data citation?
Background of this study
Assigning persistent identifiers to different kinds of data objects is becoming more and more important in research practice. Identifiers are crucial not the least in the process of sharing information and making connections between research data. Managing systems of identifiers and setting certain standards for assigning persistent identifiers to data objects is a major challenge, and there remains a lot of technical and development work to be done. Within the environmental and climate research areas there are currently many discussions on how to agree on common standards and practices for data citation and persistent identifiers.

The ENVRIplus project is highly engaged in these questions on standards and practices for data citation and persistent identifiers. One of the ENVRIplus work packages focuses on data identification and citation, namely Work Package 6 entitled “Research Infrastructure data identification and citation services”. Within this sub project we are now sending out a questionnaire to publishers, PID service providers and other organizations engaged in providing services to research infrastructures. The questionnaire is also a follow-up on the ENVRIplus workshop “Closing the gap: The need for tools to identify, track and cite environmental research data” that was held in Hamburg in October 2017. Further information on the ENVRIplus workshop is available at:

List of definitions
Persistent identifier (PID)
A persistent identifier is a long-lasting ID represented by a string that uniquely identifies a data object (DO) and that is intended to to be persistently resolved to meaningful state information about the identified data object.

Fragment dataset
In research there may be needed to make citations to subsets of a dataset, which is fragment dataset, on a very granular level, in particular when there is a constant change and update of the dataset.

Data collection
In some cases there might be a need to gather, collect several datasets into a data collection.

Dynamic data
Dynamic data refers to datasets that may change over time, e.g. because new data has been added, updates or changes of data have been made.

Queries to data stores
Instead of storing many duplicates of subsets of data it is possible to create specific queries in order to identify and obtain certain subsets of data. The queries may also be stored in a query store, and thus possible to re-run and re-use.

PID types:

Archival Resource Keys (ARK)
The Archival Resource Key (ARK) is a Uniform Resource Locator intended to serve as a long-term persistent identifier. The system of ARK was developed by the California Digital Library in 2003.

Digital Object Identifiers (DOI)
A Digital Object Identifier (DOI) is a unique identifier linked to a specific object, which must be a clearly defined piece of intellectual property. The system of DOI was introduced by the International DOI Foundation in 1998, and it makes use of the Handle System.

The Handle System was invented by the Corporation for National Research Initiative (CNRI), and it facilitates the assignment of unique global persistent identifiers to locate digital resources over time, in a manner that is independent of current or future storage locations. The Handle System is used by thousands of organizations to assign persistent identifiers, for example the DOI system uses the Handle protocol.

Life Science Identifiers (LSID)
The Life Science Identifier system was introduced by the Object Management Group (OMP) in 2004 as a system to uniquely name life science entities. LSID are being used by all globally leading providers for biodiversity data to identify organism names.

Persistent URL (PURL)
A persistent uniform resource locator (PURL) is a uniform resource locator (URL) (i.e., location-based uniform resource identifier or URI) that is used to redirect to the location of the requested web resource. The PURL system was developed by the Online Computer Library Center (OCLC) in 1995.

Uniform Resource Name (URN)
URNs were intended to serve as persistent, location-independent identifiers, allowing the simple mapping of namespaces into a single URN namespace.

Further reading
• Duerr, et al. On the utility of identification schemes for digital earth science data: an assessment and recommendations. Earth Sci Inform. 2011; 4: 139-160.

• Socha, Y.M. Out of cite, out of mind: The current state of practice policy, and technology for the citation of data. Data Science Journal. 2013; 12 September.

• Hellström et al. A system design for data identifier and citation services for environmental RIs projects to prepare an ENVRIPLUS strategy to negotiate with external organisations, Work Package 6 – inter-RI data identification and citation services. ENVRIPLUS; 2017.

• ENVRIPLUS. Presentations for the workshop “Closing the gap: The need for tools to identify, track and cite environmental research data”. [Internet]. 2017 [cited date 2018-01-10]. Available from:

• Rauber et al. Identification of Reproducible Subsets of Data Citation, Sharing and Re-Use. Bulletin of the IEEE Technical Committe on Digital Libraries. 2016; 12(1).

• Dodds, et al. Creating Value with Identifiers in an Open Data World. Open Data Institute, Thomson Reuters; 2016.

Further contact and information
If you have any questions regarding the questionnaire please contact Maria Johnsson, e-mail: , Lund University, project member in ENVRIplus Work Package 6.

Other ENVRIplus Work Package 6 contacts:
• Alex Vermeulen, Margareta Hellström, Lund University / ICOS Carbon Portal
• Frank Toussaint, Stephan Kindermann, DKRZ (Deutsches Klimarechenzentrum)
• Robert Huber, Markus Stocker, Universität Bremen

