ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
This is a "spreadsheet version" of the current InvenioRDM metadata schema , I also added a few extra fields we said we want to be present as separate fields , so we can query them, they start with a "*". Feel free to add more.
2
NOTE: columns with headings in italics denote optional fields.
3
resource_typeCreatorsTitleAdditional titlepublisher
publication_date
subjects
Contributors
DateLanguagesidentifierssizesformatversionRightsDescriptionadditional_descriptionslocationsFundingreferences*Temporal resolution*Temporal coverage
*Spatial resolution
*Spatial coverage
Relationships
4
currently there's a predefined list of possible values, this is easy to modifyThis has 3 sub-fields for each creator: 1) person-or-org this will have the actual name and is required; 2) role, 3) affiliationPrimary title of the record.For acronyms, subtitles etcI know we said we might not need this it might be obvious, but I think before getting rid of it, since it's there already we should make sure it is really obvious elsewhere in every case. Maybe the easiest thing is to leave it but make it optional, as we might list data which isn't published too.This might be tricky in some cases so we could keep it as an optional, it can help sometimes in clarifying the age of the data.List, each item has 3 sub-fields: 1)"subject" ; 2) "identifier": an 'id' or 'code' identifying the subject in a controlled vocabulary; 3) "scheme": the controlled vocabulary. This is really useful as it's the base for keywords, but we should find which controlled vocabulary we can use. NB the only required sub-field is subject" so you can use anything potentially.Contributors in order of importance. This is like creators , except that the "role" sub-filed is also required.Date or date interval, list each item has 3 sub-fields "date", "type" and "description". This refers to the metadata record itself rather than the data, as it is clarified by the "description" that can be : "accepted", "available".. etc. "type" is "fromdate", "todate"Languages: a list of languages used in the record, this might be redudant, as we are always using English. But like for other similar records as "date" and "publisher" they might be useful if you're exporting the record to an official schema.Alternate identifiers for the record. This is what we should use for the actual dois, metadata records etc. Each item has 4 sub-fields: 1) "identifier"; 2) "scheme" like "doi.org"; 3) "relation_type" the values are controlled by a vocabulary, we might want to add some if we need; 4) "resource_type" this uses the samevocabulary as col A. We might want to change this or at least make sure that there are also values relevant to this use case.order of magnitude - mb, gb, tbProposed controlled vocabulary:
netcdf,
binary,
CSV,
ascii,
shape file,
TIFF/GEOTIFF,
other.
Any license or copyright information for this resource. List, for each item there are 4 sub-fields: 1) "id"; 2) "title": The license name or license itself. ; 3) "description": The license description; 4) "link": the url to the license if availableThe actual description of the datasetList of extra optional fields. Each has 3 sub-fields 1) "description"; 2) "type" this is probably using a controlled vocabulary 3) 'language".Geographical locations relevant to this record. This part of the schema is modelled on GeoJSON, but without the scope to embed arbitrary metadata. A field `features` is nested within `locations` to give scope later to say something about the locations as a whole. Still need to check how this would work, looks like they don't allow bounding box although it is aprt of the GeoJSON format they use.This has two main subfields with their own elements: funder and award. I'm not sure if we need this info as it shuld be in the main official metadata.This is an optional field which should be useful. just as for paper you can list here any document, source you mention in the main dataset "description" . Is a list each item has 3 sub-fields: 1) "reference": the actual reference string; 2) "identifier": an identifier if any; 3) scheme: the kind of identifier couldjust be "url"Derived from, produced by (what software) etc.
5
6
Examples
7
data collection - modelESGF or whatever organisation identify the group effort
Coupled Model Intercomparison Project Phase 6
CMIP6ESGFNAIPCC, model ...NA
start of project from-date?
1) esgf url 2) https://opus.nci.org.au/display/CMIP/CMIP+Community+Home ....NetCDFNAOpenA collection of standard climate experiments run by many institutions using different numerical models, with common experiment protocols, forcings and outputsNCI Directory: /g/data/oi10/replicas/CMIP6 /g/data/fs38/publications/CMIP6. Intake Catalogue: /g/data/hh5/public/apps/nci-intake-catalogue/esgf/cmip6/catalogue.json NCI notes: Project oi10 has data replicated from other ESGF nodes Project fs38 has data published by NCIhttps://www.wcrp-climate.org/wgcm-cmip/wgcm-cmip63hr, 6hr, day, mon, yrvariousvariousGlobal
8
NCIdoi:10.5194/gmd-9-1937-2016
9
dataset - reanalysisNERSC - CIRES
20th Century Reanalysis
20CRv3large_ensemblehttps://psl.noaa.gov/data/20thC_Rean/ , https://psl.noaa.gov/data/gridded/data.20thC_ReanV3.html
https://portal.nersc.gov/project/20C_Reanalysis/

NetCDF gribv3open / License is custom
NOAA-CIRES-DOE 20th Century Reanalysis V3 contains objectively-analyzed 4-dimensional weather maps and their uncertainty from the early 19th century to the 21st century.
NCI Directory: /g/data/ua8/LE_models/20CRv3/NCI
3hr, day, mon1806-20151degX1degglobal
10
dataset- observation - satelliteNASATRMMNCI Directory
/g/data/ua8/Precipitation/TRMM/3B42
/g/data/rr5/satellite/products/trmm
NCI Notes
Project ua8 has files in HDF and NetCDF format for the full period
Project rr5 has HDF only covering 1998 to 2017
11
Tropical Rainfall Measuring Mission Multi-satellite Precipitation Analysis
TRMM 3B42NASAprecipitationdoi: https://doi.org/10.5067/TRMM/TMPA/3H/7HDF NetCDFv7This dataset is the output from the TMPA (TRMM Multi-satellite Precipitation Analysis) Algorithm, and provides precipitation estimates in the TRMM regions that have the (nearly-zero) bias of the ”TRMM Combined Instrument” precipitation estimate and the dense sampling of high-quality microwave data with fill-in using microwave-calibrated infrared estimates. The granule temporal coverage is 3 hours.NCI3hr
1979-12-31/2020-01-01
0.25 degX 0.25 deg
60N - 60S
12
dataset - modelDepartment of Planning, Industry and EnvironmentNSW/ACT Regional Climate Modelling Project 1.5NARCliM1.5Department of Planning, Industry and Environment2021Regional climate projectionsHDF NetCDFDownscaled regional climate projections at 50km over the Australasia CORDEX domain and 10 km over southeast Australia.AustraliaNSW State Government.Nishant et al. (2021)1hr to annual1951-2100~50km and ~10 kmAustralia
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100