current metadata schema with proposed changes

	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q	R	S	T	U	V	W	X	Y
1	This is a "spreadsheet version" of the current InvenioRDM metadata schema , I also added a few extra fields we said we want to be present as separate fields , so we can query them, they start with a "*". Feel free to add more.
2	NOTE: columns with headings in italics denote optional fields.
3	resource_type	Creators	Title	Additional title	publisher	publication_date	subjects	Contributors	Date	Languages	identifiers	sizes	format	version	Rights	Description	additional_descriptions	locations	Funding	references	*Temporal resolution	*Temporal coverage	*Spatial resolution	*Spatial coverage	Relationships

4	currently there's a predefined list of possible values, this is easy to modify	This has 3 sub-fields for each creator: 1) person-or-org this will have the actual name and is required; 2) role, 3) affiliation	Primary title of the record.	For acronyms, subtitles etc	I know we said we might not need this it might be obvious, but I think before getting rid of it, since it's there already we should make sure it is really obvious elsewhere in every case. Maybe the easiest thing is to leave it but make it optional, as we might list data which isn't published too.	This might be tricky in some cases so we could keep it as an optional, it can help sometimes in clarifying the age of the data.	List, each item has 3 sub-fields: 1)"subject" ; 2) "identifier": an 'id' or 'code' identifying the subject in a controlled vocabulary; 3) "scheme": the controlled vocabulary. This is really useful as it's the base for keywords, but we should find which controlled vocabulary we can use. NB the only required sub-field is subject" so you can use anything potentially.	Contributors in order of importance. This is like creators , except that the "role" sub-filed is also required.	Date or date interval, list each item has 3 sub-fields "date", "type" and "description". This refers to the metadata record itself rather than the data, as it is clarified by the "description" that can be : "accepted", "available".. etc. "type" is "fromdate", "todate"	Languages: a list of languages used in the record, this might be redudant, as we are always using English. But like for other similar records as "date" and "publisher" they might be useful if you're exporting the record to an official schema.	Alternate identifiers for the record. This is what we should use for the actual dois, metadata records etc. Each item has 4 sub-fields: 1) "identifier"; 2) "scheme" like "doi.org"; 3) "relation_type" the values are controlled by a vocabulary, we might want to add some if we need; 4) "resource_type" this uses the samevocabulary as col A. We might want to change this or at least make sure that there are also values relevant to this use case.	order of magnitude - mb, gb, tb	Proposed controlled vocabulary: netcdf, binary, CSV, ascii, shape file, TIFF/GEOTIFF, other.		Any license or copyright information for this resource. List, for each item there are 4 sub-fields: 1) "id"; 2) "title": The license name or license itself. ; 3) "description": The license description; 4) "link": the url to the license if available	The actual description of the dataset	List of extra optional fields. Each has 3 sub-fields 1) "description"; 2) "type" this is probably using a controlled vocabulary 3) 'language".	Geographical locations relevant to this record. This part of the schema is modelled on GeoJSON, but without the scope to embed arbitrary metadata. A field `features` is nested within `locations` to give scope later to say something about the locations as a whole. Still need to check how this would work, looks like they don't allow bounding box although it is aprt of the GeoJSON format they use.	This has two main subfields with their own elements: funder and award. I'm not sure if we need this info as it shuld be in the main official metadata.	This is an optional field which should be useful. just as for paper you can list here any document, source you mention in the main dataset "description" . Is a list each item has 3 sub-fields: 1) "reference": the actual reference string; 2) "identifier": an identifier if any; 3) scheme: the kind of identifier couldjust be "url"					Derived from, produced by (what software) etc.
5
6	Examples
7	data collection - model	ESGF or whatever organisation identify the group effort	Coupled Model Intercomparison Project Phase 6	CMIP6	ESGF	NA	IPCC, model ...	NA	start of project from-date?		1) esgf url 2) https://opus.nci.org.au/display/CMIP/CMIP+Community+Home ....		NetCDF	NA	Open	A collection of standard climate experiments run by many institutions using different numerical models, with common experiment protocols, forcings and outputs	NCI Directory: /g/data/oi10/replicas/CMIP6 /g/data/fs38/publications/CMIP6. Intake Catalogue: /g/data/hh5/public/apps/nci-intake-catalogue/esgf/cmip6/catalogue.json NCI notes: Project oi10 has data replicated from other ESGF nodes Project fs38 has data published by NCI			https://www.wcrp-climate.org/wgcm-cmip/wgcm-cmip6	3hr, 6hr, day, mon, yr	various	various	Global
8																		NCI		doi:10.5194/gmd-9-1937-2016
9	dataset - reanalysis	NERSC - CIRES	20th Century Reanalysis	20CRv3			large_ensemble				https://psl.noaa.gov/data/20thC_Rean/ , https://psl.noaa.gov/data/gridded/data.20thC_ReanV3.html https://portal.nersc.gov/project/20C_Reanalysis/		NetCDF grib	v3	open / License is custom	NOAA-CIRES-DOE 20th Century Reanalysis V3 contains objectively-analyzed 4-dimensional weather maps and their uncertainty from the early 19th century to the 21st century.	NCI Directory: /g/data/ua8/LE_models/20CRv3/	NCI			3hr, day, mon	1806-2015	1degX1deg	global
10	dataset- observation - satellite	NASA		TRMM													NCI Directory /g/data/ua8/Precipitation/TRMM/3B42 /g/data/rr5/satellite/products/trmm NCI Notes Project ua8 has files in HDF and NetCDF format for the full period Project rr5 has HDF only covering 1998 to 2017
11			Tropical Rainfall Measuring Mission Multi-satellite Precipitation Analysis	TRMM 3B42	NASA		precipitation				doi: https://doi.org/10.5067/TRMM/TMPA/3H/7		HDF NetCDF	v7		This dataset is the output from the TMPA (TRMM Multi-satellite Precipitation Analysis) Algorithm, and provides precipitation estimates in the TRMM regions that have the (nearly-zero) bias of the ”TRMM Combined Instrument” precipitation estimate and the dense sampling of high-quality microwave data with fill-in using microwave-calibrated infrared estimates. The granule temporal coverage is 3 hours.		NCI			3hr	1979-12-31/2020-01-01	0.25 degX 0.25 deg	60N - 60S
12	dataset - model	Department of Planning, Industry and Environment	NSW/ACT Regional Climate Modelling Project 1.5	NARCliM1.5	Department of Planning, Industry and Environment	2021	Regional climate projections						HDF NetCDF			Downscaled regional climate projections at 50km over the Australasia CORDEX domain and 10 km over southeast Australia.		Australia	NSW State Government.	Nishant et al. (2021)	1hr to annual	1951-2100	~50km and ~10 km	Australia
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100