ABCDEFGHIJKLMNOPQRST
1
Initial questionsDatasharing checklistAdditional details for specific casesOther common questionsFor code
2
3
What kind of data is being shared? File size? Compression? Updatable?Description of the project/dataset: What was the project about and why share this data, and who should use it? Who are the individuals involved (if more than just yourself) and contact details? Any plans/schedule for updating the data, and how users can submit issues/requests (GitHub has this inbuilt)Dataset builders seeking collaboration: To formalize the dataset as a publication you may also want to assign a Digital Object Identifier. If you use GitHub, one of the recommended ways to add a DOI is via Zenodo, see https://guides.github.com/activities/citable-code/. DataCite (https://datacite.org/) also provides many options for assigning a DOI to a dataset. What upstreams contribute to your work / what toolchain is used? How do you note when they change; when does this trigger a recompilation? Do you have a process-dependency tree?Repository structure: (start a github or gitlab repository, add a description, choose a license)
4
Where should it be hosted? For how long? Do I require registration info for others to use my data? Do i need a download counter?Describe the data: Data files + datasets (size, relevance to what sort of analysis, etc.) Data schemas and/or field descriptions
How was the data produced? Method + data cleaning.
Was there any preregistration of statistical methods?
How is the data updated? (if relevant)

On Datasharing: Matt Marx's writeup on his sharing of data + process for Reliance on Science Reuse: What sorts of reuse are expected? What derivative datasets or recombinations with other data layers are collaborators interested in making?Reuse: What sorts of reuse do you expect? Including modules used with other code , or plugins developed by others to make your code work for their use cases
5
Affiliation Requirements or resources associated withg that affiliation? Do I plan to maintain the data? Do I plan to respond to queries?Describe the code: Document the source code and process used to create the dataset How to use and build on the data (e.g. tune-able parameters in key steps, explicit nods to replication, etc.)To formalize the dataset as a publication:
Also want to assign a Digital Object Identifier. If you use GitHub, an easy way to add a DOI is via Zenodo, see https://guides.github.com/activities/citable-code/. DataCite (https://datacite.org/) also provides many options for assigning a DOI to a dataset.
How are dumps provided: name, format, versioning? Is there a feed of updates to dumps?Formalize the code as a publication: you can get a DOI via Zenodo, see https://guides.github.com/activities/citable-code/
6
External links: URL / link to other datasets/software/code used URL / link to related papersExamples of datasets in GitHub: https://github.com/lmatthia/publisher-oa-portfolios · https://github.com/CSSEGISandData/COVID-19What sources are used; drawn from what set? Are the sources versioned? Regularly crawled?
7
Terms of use: How should it be cited (‘If you use the data, please cite…’). Any license details (code, other software or third party data and associated copyright or terms of use, etc.)Sharing a Public Dataset - RonS example: https://zenodo.org/record/3685972What downstreams are using this work? Is that visible to other reusers, via pingbacks or other?
8
Sharing a Dataset in-progress: example:IProduct: http://www.iproduct.io/Is this used in any metastudies? What schema or other mappings; fuzzings or anonymization; or other processing is used to make each one work? Is the metastudy map encoded in a named package that others could use?
9
How was data chosen for measurement/inclusion? How is it noted when this changes?
10
What data cleaning, noise correction, or other feedback loops did you use in compiling the data?
11
What similar efforts or alternatives exist?
12
What is the whole tale of your work -- what is needed to replicate it? Is this articulated in a [whole tale] file? Does that include workflow + usage notes?
13
Has your process been replicated in practice? By how many independent parties?
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100