Objects and Concerns

	A	B	C	D	E	F
1	Clusters: 1. Literature and related objects 2. Data and related objects 3. Software and related objects 4. Samples 5. Ontologies and vocabularies		6. Complex research objects (esp. but not exclusively learning resources) 7. Instruments and facilities 8. Organizations 9. Activities
2	Cluster No.	Research Object (note that the RO community considers an RO to be a complex object with multiple components)	Definitions	Examples	When or under what circumstance is it necessary to identify an object to enable reproducibility of the science or other use?	comments/notes

3	1	literature - peer reviewed			When the object is formally published or when it is first referenced, whichever is earlier.
4	1	literature - non peer-reviewed publications			When the object is suitable to be referenced (e.g. published on a pre-print server or otherwise made public)
5	1	literature - Reports (grey lit.), SOPs, protocols, user guides, product documentation, etc.		protocols.io kbase	When the object is suitable to be referenced (e.g. has been internally reviewed and signed off, placed in a publicly accessible place, contains key calibration information, documents the protocols used, etc.)
6	1	algorithm documentation	Human readable documentation meant to specify algorithms as intended for implementation in software	an element of a document (e.g. paper, preprint, blog), non-machine-readable text, a specification of software	When implemented, the algorithm should be identified and ideally published in a document (e.g., ATBD's)	AI may have unique considerations. Should go in the same bin as publications. A paper may have multiple algorithms, and we need to identify the specific one referenced (this is related to how to identify an element of a paper, such as a figure) (In 9 Jane workshop, we wonder if this row should actually be under literature rather than software and have now changed the title slightly and the cluster number to match)
7	2	data (Collection level)	a bundle of bits and or files that is being used for an analysis or has been compiled as a dataset	a data set in a repository, an assemblege of data used by a researcher in an analysis	When (or just before) data are used or publicly accessible, it is named. A resolvable identifier may be registered later with a landing page is prepared with pointers to documentation, quality information, etc.	The naming and location aspects of an identifier are done in two steps. It is a 2-step process, where an identifier (name) is assigned before a formal collection or data set is produced (this allows the identifier to be included in the metadata). Registering the identifier (location) is the second step that should occur just prior to publication of the dataset (or the result).
8	2	data (granules)	same as collection for this use case		When it is used (as part of a collection)	RAMA: Identification must occur when any granule is produced so that it can be uniquely identified and fetched later. Otherwise, reproducibility is impossible.
9	2	service products	the result of event that results in a type, product, and status	- Products produced as a result of executing an API, an event detected in streaming data	When a defined event occurs--a service call, a defined pattern in the data, etc.--a name and retreival process need to be defined.
10	2	images, video, and audio	Same as collection or service product	- Static figures - Static maps - Stand alone videos	similar to data collections, includes interpreted products like figures and maps
11	2	model outputs			similar to data collections
12	3	software	A set of instructions that performs some action, either as source code (machine-readable) or executable	- source code (e.g., something from GitHub or a local disk) - project analytical code (e.g., R script) - a compiled executable (e.g., ArcGIS) - software dependencies (e.g., published library such as numpy, unpublished local libraries, included source code) - runtime environment (e.g., OS version)	When the object is formally published or when it is first referenced, whichever is earlier. There may also be a need to identify the software at the time of execution because of dependencies. In this case of provenance tools, the act of execution can create a reference, which might be to a local name for the object	software has a way of mutating, propagating among many authors. care must be taken to refer to authoritative repositories for that 'strain' of code and to the release versions. Also note: software that is being used has dependencies, so referencing the software can also require identifying all the dependencies. Also, references to published software packages are in many ways easier than references to project-specific, one-off code.
13	3	software service (accessed via an endpoint via an API)	software function(s) that are remotely executable over networks	OGC services, such as WMS	When the service is formally published or when it is first referenced, whichever is earlier.	Hosted services cannot be guaranteed to last in perpetuity. Burden on the hosting center. Some kind of sunsetting documentation or forwarding address necessary in such cases. Also, services are often black boxes, with semi-ambiguous implementation details.
14	3	Software workflows	Processes/steps that capture how an object is created. Can be stored in multiple formats, as software (a script) or as data (a set of steps, inputs, and outputs)	Yes Workflow annotated scripts, Pegasus DAX, Jupyter Notebooks, KBase	same as for "software"	RAMA: Note interesting article on this topic - https://doi.org/10.1029/2019EO136216 DSK: Also see https://doi.org/10.5281/zenodo.3336147 and https://danielskatzblog.wordpress.com/2019/02/05/using-workflows-expressed-as-code-and-workflows-expressed-as-data-together/ (In 9 Jan workshop, we wonder if this row should be combined by the "software" row)
15	3	models			When the object is formally published or when it is first referenced, whichever is earlier.
16	3	Visualization tool/Web application			When the object is formally published or when it is first referenced, whichever is earlier.	web applications have a way of 'upgrading' behind the scenes without accountability to definitive' releases. URL, version, and sometimes date of access all need to be indicated.
17	4	Physical samples	Subset of an object or sampling feature; the resulting object may be representative of the whole or concept. It may be representative of an object or a subset. Other related terms include material sample and specimen. Specimen can be considered a result of sampling. Similar term, sampling feature - the feature that allows access to the sample, which will have subset taken from it. E.g. a well or borehole where cores and subsamples might be taken. Samples may be ephermal, destroyed during the sampling process and/or analysis.	Rock sample taken from an outcrop; sediment core (lake, land, ocean); water sample taken from a stream; fossil specimen; material created in a laboratory.	When the object is formally published/registered or when it is first referenced, whichever is earlier. (How to deal with samples destroyed in use? Detailed documentation important). Ideal process would be to identify the sample when it is being captured (in the field). When not possible (digital devices not avaialble in the field), as soon as possible upon return. When submitting a sample to a repository for archiving. When submitting a sample to a lab or other organization for analysis of some sort. Can be dependant on the domain of researcher or purpose of the research activity. Some resarch activities support creating a citation before the sample is captured to SUPPORT the capture of the information at the same time as sampling. Needs to also support when that citation need might be a long time after the sampling activitiy took place (in case of researcher who might want to get a citation years after collecting the sample because research needs change) or to support citations being created when a sample is sent out on loan. About being able to reproduce the context in which the sample was collected.	Do you need to identifiy a sample BEFORE you collect it? (to help with documentation capture or creation). What about citing a collection of samples? Do you cite each individually or a set? Does the set get a unique citation and at what granularity do you made the decision as to what is a set or an individual sample? How do we provide guidance where there are differences in different domains or use cases? Consideration - being funded to collect samples in difficult or rare location; should be more pressured to require the citation at that time vs. someone going out on their own to collect samples, more leeway as to when that individual decides to create a citation. Recommend allowing the option to separate the sampling event out from the sample with different citation material captured for each. Some information from the sampling event would not be maintained with the sample and it can be difficlt to aggregate samples based around concepts contained in the sampling event.
18	4	Specimen
19	4	physical (?) field/lab notebooks	Physical document (handwritten, typed, etc.) with annotations and drawings that may never be digitized but contain research provenance. Maybe be considered data and related objects to samples. But also may be created and not be tied or connected to any sampling activitiy.	Geologist field notebook capturing notes and drawings about where a sample is collected. Lab notebook capturing infomration about the steps taken in an experiment. May contain ideas or information not formally represented in a published report or paper. This information can provide provenance or clarification when questions arrise in replicating the resulting data.	When the object is suitable to be referenced (e.g. has been internally reviewed and signed off, placed in a publicly accessible place, contains key calibration information, documents the protocols used, etc.).	How is this different in a digital age? Is still relevant but may have a larger focus on legacy data. Not confident it belongs tied to the sampling concept as some lab or field notebooks are not related to any sampling activitiy and others only reprensent metadata about the sample. Not exclusively a consideration connected to samples.
20	4	Physical notes (cards, labels on bottles)	Reference information about a physical sample or sampling event, or sampling device. It is not necessarily digital, and may be attached to the sample or stored separetly in a catalog system.	Index cards, labels on bottles
21	5	ontologies/vocabularies	Any kind of semantic resource along the semantic gradient - from glossaries, controlled vocabularies, thesauri, taxonomies, ontologies, etc.	SWEET, ENVO, DOLCE, ODPs, linked data, controlled vocabularies, owl, Fierz, C., Armstrong, R.L., Durand, Y., Etchevers, P., Greene, E., McClung, D.M., Nishimura, K., Satyawali, P.K. and Sokratov, S.A. 2009. The International Classification for Seasonal Snow on the Ground. IHP-VII Technical Documents in Hydrology N°83, IACS Contribution N°1, UNESCO-IHP, Paris.	When the resource (vocabulary, etc.) is used, in whole or part, formally published or when it is first referenced (e.g. in data/md), whichever is earlier. The primary issue is that semantic resources vary from documents with traditional citations to software, etc. Worse yet, many forms (e.g., OWL ontologies) can play different roles and thus have some of the attributes of data, software, and documentation all at once. Where terms are used in queries to extract data subsets, dynamic data citation principles would require a citation. The case when the resource is used in the process to generate the research object, requirements to cite your methods and establish provenance would also require it.	Gary Berg-Cross, added linked data which can be used to define vocabularies (a cat is a mammal which can be in LD as well as an ontology) as well as data. We had Qs such as do catalogs count? We employed a knowldge graph use case to discuss what vocabularies and ontologies might have been used to put it together. To make it or a piece of it usable we need to cite the vocabulary and/or RDF/ontology sources that were used. There was also the idea that identifiers which would later be used for citation would more efficiently be available to find vocabularies and such while being worked on. This would allow people to cite pre-published material for work groups.
22	6	complex digital objects esp. learning objects	A complex digital object is defined by the Digital Curation Centre as a discrete digital object made by combining a number of digital objects.	- Learning resources, Jupyter notebooks	When the object is intended to be shared, and possibly re-used as such or to be adapted or subset. A learning resource may contain, for instance, a set of powerpoint slides, an audio representation of the creator or other presenter making a presentation of the slides, a video of both the slides and audio track of the presentation, learning activities that provide opportunities for hands-on training associated with the subject of the training, and an answer key. While each of the components of the entire complex object could have their own citation, a citation for the entire object should be cited in order to retain the integrity of the entire object, provide enough information for its re-use as the creator intended, and provide credit to the learning resource creator. ALEXIS: Like with software, if a teacher updates a spreadsheet they use in an exercise, does it need to be versioned? How much change indicates versioning	web applications have a way of 'upgrading' behind the scenes without accountability to definitive' releases. URL, version, and sometimes date of access all need to be indicated. Notes from working session: Use cases / examples: 1. KBase (container for snapshot of a research object) • Has a narrative that describes all steps, components and path which may or may not include order or sequence of use of components or actions, plus an image / visualization of the results of the action using the component (e.g., dataset) • Each of the components has its own identifier that is persistent w/in its own system • Important that narrative includes the path to getting the overall citable object. • It’s unclear if the view of the results have their own identifier; would be preferable (want to use as an illustration in a paper, for instance) 2. Image that has other images included; Overall image gets the citation, but the description (metadata for the overall image citation) should include enough information to find the source of the other images. However, the source information of the component images should not be included in the citation for the overall object. 3. Learning resource: If contains static components, such as slides, video / audio file of a presentation of slides, perhaps the transcript of the presentation, supplementary spreadsheets, perhaps a pertinent training exercise all of which exist only within the “container” of the learning resource, and doesn’t exist outside, the overall resource would receive the citation. The static components within don’t need a citation or a DOI for themselves. If the components do exist outside and “want” to be made available separately and publicly, then they should get their own citation, Other Notes: For the resources identified above, we wouldn’t necessarily need to change the citation if documentation changes (e.g., landing page); e.g., if a new version of a component dataset is created. Component items within a collection should have their own identifier that is persistent within the system in which they are created, i.e., a pertinent row within a database that is used for illustrative purposes. Used the example of an insect collection which might contain specimen identifiers for each specimen, DNA analyses identifiers for each analysis, an identifier for the collection event for each insect in the collection.
23	7	specialty instruments	Speciality instruments are a measurement system that may be deployed on multiple research platforms. Should include whether it is an instance of an instrument or class of instrument. Instruments are elements of measurement systems	Nitric Oxide Chemiluminescence Ozone Instrument (can be installed and deployed on various research aircraft), see https://doi.org/10.5065/D6SN070H	When the instrument is used to produce shared data or analysis. Reproducibility for sensitive instruments may be a challenge due to ITAR.	RAMA: Not sure people will be willing to share information about specialty instruments. This can be a very sensitive area - especially ITAR considerations. However, if someone feels free to share the information, it should be when the instrument is mature, proven through tests to be ready for operations, and documentation is publicly accessible.
24	7	facilities	Large research platforms such as aircraft, ships, radars, etc. Often, these facilities may be populated with various specialty instruments which are DOI'd separately -- would some special laboratories, for example for making measurement under high pressures, be included? Platforms and facilities are different. Facilities are like NCAR. We will assume we are talking about platforms here.	ROR, NSF C-130 research aircraft, see https://doi.org/10.5065/D6WM1BG0 see ror.org for information about RORs	Facilities often have a plethora of other research artifacts like software, instruments, data systems, etc. which each have various versions and may be DOI''d separately. To address reproducibility, all of these other DOIs must be included and referenced. Facilities that host and maintain instrumentyation for use in maultiple experiments or camnpaigns ues Identifiers to track usage and help ensure that they get credit for their important contributions.	While the use of a DOI for facilities is often used to track usage, it is also important to give credit to authors who were the first users/developers of this facility. Ideally, this would be a "related identifier" in the DataCite metadata for the facility DOI. See https://datascience.codata.org/articles/10.5334/dsj-2017-007/ Facilities are an organization type in ROR (GRID) so many of them currently have RORs. Identifier metadata should include related identifers that describe relationships between facilities, platforms, instruments, and sensors.
25	8	organization - repository			When the repository starts operations, distributing research objects to users. While not critical for reproducibility, it is an important part of provenance. A user should be able to contact the repository with questions, if any, about the research object - so it supports reusability and reproducibility.	See http://wiki.esipfed.org/index.php/Category:Identifying_ESIP_Connections for information about U.S. Federal RORs and ESIP RORs
26	8	organization - funder			Not needed for reproducibility
27	9	activities - projects			The identity of a project by itself may not be important for reproducibility. On the other hand, it is important to identify research objects that result from a project, any events within the project that may affect reproducibility (e.g., instrument logs of calibration events, outages, etc.) and identity of a project can be part of the metadata associated with the objects for completeness (and distinguishing between similar objects from different projects).	Note this activity identifier project: https://www.raid.org.au/ It depends on what we mean by "identify". Since projects are temporary, it might make sense to identify the paper or report about the results of the project. RAMA: I think it also depends on the nature of the Project. For example, EOS Terra as a project does not exist anymore, but information about the project exists (https://terra.nasa.gov/about).Likewise Aqua - https://aqua.nasa.gov/.
28	9	Activities - campaign		- Field research	SImilar to projects
29	9	activities - mission		- Satellite	SImilar to projects
30	9	Conferences			The identity of a workhop or conference by itself may not be important for reproducibility. On the other hand, it is important to identify research objects that result from the meeting, any circumstances within the meeting that may affect reproducibility (e.g. funder, attendees) and identity of a meeting can be part of the metadata associated with the objects for completeness (and distinguishing between similar objects from different meeting). Need upon product generation/publication.
31	9	activites - expeditions		- Field trips - Research cruises	The identity of an expedition by itself may not be important for reproducibility. On the other hand, it is important to identify research objects that result from an expedition, any events within the project expedition may affect reproducibility (e.g., instrument logs of calibration events, outages, etc.) and identity of a expedition can be part of the metadata associated with the objects for completeness (and distinguishing between similar objects from different expedition).
32	?	Research Object (in situ observations) - Annotations	Text related to a research object	underwater passive acoustic data annotations such as biological sounds (call-types)/anthropogenic sounds(vessels)/environmental(rain), video annotations (e.g., annotations of seafloor substrate or marine organisms in underwater video), field observations (e.g., bird sightings)	When a methodology (human or machine) was used to make the annotations and made available, or when analysis is conducted upon those annotations	RDA Preserving Scientific Annotations WG might have some insights. Some debate on whether this is its own research object or a derived dataset. Is the citeable object here the set of annotations or individual annotations?
33	?	Research Object (literature) - Annotations	Text related to a research object	NCAR Climate Data Guide (https://climatedataguide.ucar.edu/)?	When the annotation changes the scientific nature of the annotated object (e.g., the annotation corrects a value or notes a limitation or uncertainty).	annotation might correct a graph in a previous paper, something that is submitted to the publisher for correction to a previous publication
34	?	Data Management Plans			Not needed
35	?	Funding mechanisms		- Grants - Scholarships	Normally not needed for reproducibility; however, when there are concerns about conflicts of interest, etc. it may be important at time of publication of results.
36	?	metadata			Similar to data, but may need to be separated because a question of authenticity of the metadata, locations in a public registry or clearinghouse, as opposed to within a local catalog.
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100