ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAEAFAGAHAIAJAKALAMANAOAP
1
SourceResearch Artifact Typesliterature - peer reviewedliterature - non peer-reviewed publicationsliterature - Reports (grey lit.), SOPs, protocols, user guides, product documentation, etc.algorithm documentationdata (Collection level)data (granules)service productsimages, video, and audiomodel outputssoftwaresoftware service (accessed via an endpoint via an API)Software workflowsmodelsVisualization tool/Web applicationPhysical samplesSpecimenphysical (?) field/lab notebooksPhysical notes (cards, labels on bottles)ontologies/vocabulariescomplex digital objects esp. learning objectsspecialty instrumentsfacilitiesorganization - repositoryorganization - funderactivities - projectsActivities - campaignactivities - missionactivities - Conferencesactivites - expeditionsAnnotations to a research artifactAnnotations to literatureData Management PlansFunding mechanismsmetadata
2
Challenges/
Things to consider
PersistenceAccess/
Reuse
Give Credit/BlameSemanticsForensicsArtifact DefinitionHuman readable documentation meant to specify algorithms as intended for implementation in softwarean identifieable bundle of bits and or files that is being used for an analysis or has been compiled as a datasetthe result of event that results in a type, product, and statusA set of instructions that performs some action, either as source code (machine-readable) or executablesoftware function(s) that are remotely executable over networksProcesses/steps that capture how an object is created. Can be stored in multiple formats, as software (a script) or as data (a set of steps, inputs, and outputs)Subset of an object or sampling feature; the resulting object may be representative of the whole or concept. It may be representative of an object or a subset. Other related terms include material sample and specimen. Specimen can be considered a result of sampling. Similar term, sampling feature - the feature that allows access to the sample, which will have subset taken from it. E.g. a well or borehole where cores and subsamples might be taken. Samples may be ephermal, destroyed during the sampling process and/or analysis.biological specimenPhysical document (handwritten, typed, etc.) with annotations and drawings that may never be digitized but contain research provenance. Maybe be considered data and related objects to samples. But also may be created and not be tied or connected to any sampling activitiy.Reference information about a physical sample or sampling event, or sampling device. It is not necessarily digital, and may be attached to the sample or stored separetly in a catalog system.Any kind of semantic resource along the semantic gradient - from glossaries, controlled vocabularies, thesauri, taxonomies, ontologies, etc.A complex digital object is defined by the Digital Curation Centre as a discrete digital object made by combining a number of digital objects.Speciality instruments are a measurement system that may be deployed on multiple research platforms. Should include whether it is an instance of an instrument or class of instrument. Instruments are elements of measurement systems'Large research platforms such as aircraft, ships, radars, etc. Often, these facilities may be populated with various specialty instruments which are DOI'd separately -- would some special laboratories, for example for making measurement under high pressures, be included? Platforms and facilities are different. Facilities are like NCAR. We will assume we are talking about platforms here.Text related to a research object
3
xRDA PIT WGData replication: Manage replicas created from master objects for data safety purposes.
4
xRDA PIT WGData access load leveling: Provide access to alternative data objects depending on availability and performance.
5
xRDA PIT WGFormat obsolescence audit: Assess the format obsolescence risk of individual resources.
6
xRDA PIT WGVersioning: Give access to newer versions of an identified object and providing suitable context information.
7
xxxRDA PIT WGComposite objects: Reflect discipline-specific ranges of object granularity through type-encoded relations.
8
xxRDA PIT WGManaging data objects and metadata objects in combination: Coverage of specific community scenarios that require both individually identified and interrelated data and metadata objects.
9
xRDA PIT WGManaging object access permissions: Enable fast decision-making of access control systems based on pure envelope evaluation.
10
xRDA PIT WGManaging write control: Controlling and tracking changes of data object collections.
11
xRDA PIT WGCustom data citation: Construct custom aggregates of multiple independent sources and provide citation information.
12
xxRDA PIT WGModifying data: Provide accountability of object modification and replacement in data infrastructures to ensure a fundamental level of service quality.
13
xRDA PIT WGProvenance tracing: Connect objects with their predecessors across repositories and provide forensic tools.
14
xDan K summary of RDA/Force11 Software ID WGAccess: Get the artifact
15
xxxDan K summary of RDA/Force11 Software ID WGArchive: ensure (research) artifacts ​are not lost;​ they must be properly archived, to ensure we can retrieve them at a later time
16
xDan K summary of RDA/Force11 Software ID WGRefer: ensure (research) artifacts ​can be precisely identified; software artifacts must be properly referenced to ensure we can identify the exact code, among many potentially archived copies, used for reproducing a specific experiment
17
xDan K summary of RDA/Force11 Software ID WGDescribe: make it easy to ​discover (​research) software artifacts; they must be equipped with proper metadata​ to make it easy to find them in a catalog or through a search engine
18
xDan K summary of RDA/Force11 Software ID WGCredit: ensure ​proper credit ​is given​ to authors and contributors ; research software must be properly cited in research articles in order to give credit to all that contributed to it
19
xxESIP Summer Meeting (2023)Trace an author for questions/contact for sources (for data, for software, for extra features) – ORCID.
20
xESIP Summer Meeting (2023)Make user registration and data entry easier (reduce duplication).
21
xxxESIP Summer Meeting (2023)Single source of truth that is machine accessible.
22
xESIP Summer Meeting (2023)Pedigree of organization (Assess trust of organization of behind the data)
23
xESIP Summer Meeting (2023)Associate other data from an organization.
24
xESIP Summer Meeting (2023)Disambiguation of related models
25
xESIP Summer Meeting (2023)Link between different registries of things
26
xESIP Summer Meeting (2023)Tracking how organizations change
27
xESIP Summer Meeting (2023)Linking data.
28
xESIP Summer Meeting (2023)Linking subject matter experts.
29
xESIP Summer Meeting (2023)Identify model concepts.
30
xESIP Summer Meeting (2023)Linking data and outcomes to research artifacts traditionally forgotten in the credit system (the physical object like cores, cutting sets, thin sections…, equipment, software) and hopefully getting citations/credit
31
xESIP Summer Meeting (2023)Citing lines of software
32
xESIP Summer Meeting (2023)Metadata for instruments that are collecting data
33
xxESIP Summer Meeting (2023)Be able to search for data in a database by PID/vocabularies
34
xESIP Summer Meeting (2023)Citing project participants and linking those participants to the project member, who are part of a working group, who are part of several organizations, etc.
35
xESIP Summer Meeting (2023)model input and output used operationally
36
xESIP Summer Meeting (2023)Reflect closeness / similarity of content
37
xxESIP Summer Meeting (2023)DOI infrastructure overrides content negotiation. (e.g. Pangaea uses DOIs. If you request an RDF representation of a data set, you will be directed to a Datacite server (crosscite), which hosts an abridged version of the data (a version of the DataCite metadata) rather than the Pangaea server that hosts the entire data set).
38
xxESIP Summer Meeting (2023)Challenges with large data sets (How large is large? TB, PB - have solutions up to 20GB and non sanctioned solutions for up to 200 GB) Our datacite agreement is for datasets specifically, although I need to investigate that
39
xxESIP Summer Meeting (2023)Content based identifiers may be more difficult to discover.
40
xxxESIP Summer Meeting (2023)Should multiple systems be used? Might be confusing to users as well as researchers who want to know how many times their research is cited. One potential solution would be to have one of the systems redirect to the other (has to manually be done, which is labor intensive) (RAiDs https://www.raid.org.au/ )
41
xxESIP Summer Meeting (2023)What do you need in addition to identifying the journal article to have any hope of reproducing it?
42
xxESIP Summer Meeting (2023)How can you be totally transparent?
43
xxxxxESIP Summer Meeting (2023)How do you deal with multiple copies?
44
xxxESIP Summer Meeting (2023)How long should a DOI persist?
45
ESIP Summer Meeting (2023)What procedures should exist if the PID authority ends
46
xxESIP Summer Meeting (2023) Hopefully get plenty of advanced warning.
47
xxxESIP Summer Meeting (2023) Mechanism to transfer to something else if it goes down
48
xxxESIP Summer Meeting (2023) What happens to the DOI landing page?
49
xxxESIP Summer Meeting (2023)Different kinds of PIDs have different kinds of metadata. This will require recommendations for specific resource types
50
xxxESIP Summer Meeting (2023)Some data sets are continually changing. This can happen unpredictably. The researcher will not able to mint a new DOI everytime data is added/changed .
51
xxESIP Summer Meeting (2023)Hashes do not indicate any relations or closeness in content.
52
xESIP Summer Meeting (2023)Identifiers for sub-parts of data sets, concepts in an ontology
53
xxESIP Summer Meeting (2023)Would be nice to have some mechanism of determining/indicating similarity or closeness
54
xxxESIP Summer Meeting (2023)Reproducibility. When users cite the data, how do other trace back to everything including the preprocessing work?
55
xxxESIP Summer Meeting (2023)How can a system indicate the level of usability for data sets?
56
xxESIP Summer Meeting (2023)How can PID system ensure that it will be sustainable and functional over time?
57
xxESIP Summer Meeting (2023)Trust that PID systems will be sustainable and functional over time?
58
xxESIP Summer Meeting (2023)Software citation and identification is really complex. SHA256 can be used for software (and is sometimes used) [one issue I can think of is that SHA256 I think are assigned to each commit in git? Some people like to rewrite their history while working on branches, which changes the SHAs. So, if working on branches and rewriting history, that would mean the SHAs would change before merging into main)\]
59
xESIP Summer Meeting (2023)If you put something in, you can still get it out.
60
61
Comments
62
ESIP Summer Meeting (2023)Snapshot data monthly for whole collection
63
ESIP Summer Meeting (2023)DOI assigned at cruise level for credit reasons, plan to assign for each snapshot
64
ESIP Summer Meeting (2023)So will have multiple IDs for largely same data
65
ESIP Summer Meeting (2023)Published mix of blog articles and web pages that will need to move.
66
ESIP Summer Meeting (2023)Would want PIDs for these, so don’t have to redirect
67
ESIP Summer Meeting (2023)Also created software - complicated to cite
68
ESIP Summer Meeting (2023)When is a URL persistent?
69
ESIP Summer Meeting (2023)Jupyter Notebooks - assigning PIDs, but what if associated components change?
70
ESIP Summer Meeting (2023)How to know how reproducible the whole workflow was?
71
ESIP Summer Meeting (2023)Never asserted that the data are the same over time.
72
ESIP Summer Meeting (2023)NASA has DOIs for data archive
73
ESIP Summer Meeting (2023)Contributing data repository wants the archive to reflect original DOI
74
ESIP Summer Meeting (2023)What characteristics of PID systems are important to deciding whether/how to:
75
ESIP Summer Meeting (2023)Use PIDs for the different use cases discussed earlier?
76
ESIP Summer Meeting (2023)You agree to this when assigning DOIs
77
ESIP Summer Meeting (2023)How broadly adopted they are
78
ESIP Summer Meeting (2023)How much community buy-in these system
79
ESIP Summer Meeting (2023)Coverage of the community in terms of who is using it
80
ESIP Summer Meeting (2023)Well scoped
81
ESIP Summer Meeting (2023)What they identify, how persistent it is, how scaleable,
82
ESIP Summer Meeting (2023)How timely they are
83
ESIP Summer Meeting (2023)Reproducibility is a challenging concept
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100