ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAEAFAGAHAIAJAKALAMANAOAP
1
This document is open to edit by anyone. It is shared publicly via the website.
Person 1Person 2Person 3Person 4Person 5Person 6Person 7Person 8Person 9Person 10Person 11Person 12Person 13Enter your responses belowEnter your responses belowEnter your responses belowEnter your responses belowEnter your responses belowEnter your responses below
2
TypeAttributeDescription of AttributeCurrent CategorisationWhat is your considered category?Reason why?What is your considered category?Reason why?What is your considered category?Reason why?What is your considered category?Reason why?What is your considered category?Reason why?What is your considered category?Reason why?What is your considered category?Reason why?What is your considered category?Reason why?What is your considered category?Reason why?What is your considered category?Reason why?What is your considered category?Reason why?What is your considered category?Reason why?What is your considered category?Reason why?What is your considered category?Reason why?What is your considered category?Reason why?What is your considered category?Reason why?What is your considered category?Reason why?What is your considered category?Reason why?What is your considered category?Reason why?
3
EXAMPLETitleThis is an example attributeMUSTSHOULDReasons why it should/ should not be here/somewhere elseSHOULDReasons why it should/ should not be here/somewhere elseSHOULDReasons why it should/ should not be here/somewhere elseSHOULDReasons why it should/ should not be here/somewhere elseSHOULDReasons why it should/ should not be here/somewhere elseSHOULDReasons why it should/ should not be here/somewhere elseSHOULDReasons why it should/ should not be here/somewhere elseSHOULDReasons why it should/ should not be here/somewhere elseSHOULDReasons why it should/ should not be here/somewhere elseSHOULDReasons why it should/ should not be here/somewhere elseSHOULDReasons why it should/ should not be here/somewhere elseSHOULDReasons why it should/ should not be here/somewhere elseSHOULDReasons why it should/ should not be here/somewhere elseSHOULDReasons why it should/ should not be here/somewhere elseSHOULDReasons why it should/ should not be here/somewhere elseSHOULDReasons why it should/ should not be here/somewhere elseSHOULDReasons why it should/ should not be here/somewhere elseSHOULDReasons why it should/ should not be here/somewhere elseSHOULDReasons why it should/ should not be here/somewhere else
4
COREIdentifierAn unambiguous reference to the resource within a given context.MAYMUSTIt would be unthinkable not to have this! Surely dData is almost useless after prototyping stage without proof of source and version...MUSTEssential to ensure unique identifaction and citationMUSTA URI is required if the dataset is going to be joined into a knowledge graphMUSTPossibly the most important field in this list! We can't have a meaningful conversation about the world if we can't agree on how to identify all the things in the world :)MAYMUST
Unique identifier useful where names are similar for multiple items
MUSTMUSTNone of this works without stable identifiersMUSTto make sure data is linkable and reusableSHOULD
This is important to make sure people do not confuse the dataset with others. DOIs are good because weblinks can break or be changed.
MUSTto make sure data is linkable and reusableSHOULD
This is important to make sure people do not confuse the dataset with others. DOIs are good because weblinks can break or be changed.
MUSTAn unabiguous identifier is not only helpful for a dataset, but also for a version within a dataset. This could be covered by a update to the version - but for ease of linking - an unambiguous link is great (see Zenodo for good practice on this - i.e. provides a doi to the version, but also a separate doi to the latest version of a dataset
5
CORETitleA name given to the resource. Typically, a Title will be a name by which the resource is formally known.MUSTMUSTMUSTMore human friendly! Helps differentiate between similiar data setsMUSTMUSTMUSTMUSTMUSTMUSTMUSTMUSTMUSTSHOULDThe Title of the dataset might change over time, so some flexibility here might help (as the unambiguous link above should provide the link)
6
COREDescriptionThis is a free text description of the data set. Description may include but is not limited to: an abstract, a table of contents, a graphical representation, or a free-text account of the resource.MUSTMUSTMUSTHelp people decide whether it is the right one before downloading /applying to use itMUSTSHOULDMAY
If the dataset is simple enough, and the title is clear enough, additional description should not be required.
MUSTMUSTMUSTMUSTMUSTMUSTMUST
7
CORECreatorAn entity responsible for making the resource.MUSTMUSTMUSTPart of the quality assessment of potential reusersSHOULDSHOULDMUSTMUSTMUSTMUSTbut this may not be easyMUSTMUSTbut this may not be easyMUSTSHOULDalways helpful to have some information here, wondering if shoud be optional, so that it does not preclude orgainsations or people from publishing data
8
COREPublisherThis is the entity responsible for making the data set available.MUSTMUSTMUSTPart of the quality assessment of potential reusersMUSTSHOULDMAY
If Creator == Publisher then one could be omitted to simplify.
MUSTMUSTSHOULDif there is a clear publisherMUSTSHOULDif there is a clear publisherMUSTSHOULDalways helpful to have some information here, wondering if shoud be optional, so that it does not preclude orgainsations or people from publishing data
9
CORELanguageA language of the resource.MAYSHOULDSHOULDMay be difficult retrospectively, but it is good practice and would help with accessibilityMAYMAYMAYSHOULDSHOULDSHOULDSHOULD
It is helpful but presumably most UK energy data is going to be in English anyway?
SHOULDSHOULD
It is helpful but presumably most UK energy data is going to be in English anyway?
MAY
10
COREContact PointRelevant contact information for the cataloged resource. Use of vCard is recommended [VCARD-RDF].SHOULDSHOULDWouldn't necessarily say vCard is the best standard - but not overly fussed. Are vCards not a Microsoft thing? I don't use them so not sure.SHOULDWould need to ensure it is easy to amend this as it will change over time!SHOULDSHOULDMAY
Lots of cases where this might not be available.
MUSTSHOULDSHOULDSHOULDSHOULDSHOULDSHOULDbetter practice for orgainsations could be a contact that is generic - that could be rereouted as staff change
11
TOPICALKeywordA keyword or tag describing the resource.MUSTMUSTMUSTNeed to be clear what the difference between this & keywordSHOULDMUSTMUST
Ideally with some kind of protection to prevent typos or case differences creating additonal tags.
SHOULDHighly desirableSHOULDFine with making this mandatory, but this is a term which allows multiple values in DCAT and can therefore be satisfied with an empty list, is this what you intend?SHOULDif full text search is also available, key words may be optional.SHOULD
Keywords should adhere to a standard vocabulary.
SHOULDif full text search is also available, key words may be optional.SHOULD
Keywords should adhere to a standard vocabulary.
12
TOPICALSubjectA topic of the resource.MAYMAYMAYNeed to be clear what the difference between this & keywordMAYIf there is a pre-specified list of topics then I'd upgrade this to a SHOULDSHOULDAgree with Ayrton: It'd be useful to use a controlled vocab for common 'subjects' like 'solar power timeseries'SHOULDSHOULDSHOULDSHOULDSHOULDSHOULDSHOULD
13
PROVENANCE & LINEAGEProvenanceA statement of any changes in ownership and custody of the resource since its creation that are significant for its authenticity, integrity, and interpretation.SHOULDMUSTWithout provenance and or date, data is not provably true. Any random data could be added without any indication of where it comes from. SHOULDVery important but may be difficult to do retrospectively, would be MUST for new dataSHOULDSHOULDSHOULDSHOULDSHOULDSHOULDgood to know the history of the dataSHOULDSHOULDgood to know the history of the dataSHOULD
14
PROVENANCE & LINEAGEWas generated byAn activity that generated, or provides the business context for, the creation of the dataset.SHOULDMUSTWithout provenance and or date, data is not provably true. Any random data could be added without any indication of where it comes from. SHOULDSee note aboveSHOULDSHOULDShould also express whether the data comes from real physical metering, or simulation, or surveys, etc.MAY
Freeform? This could become so broad as to be useless.
SHOULDSHOULDSHOULDSHOULDSHOULDSHOULD
15
PROVENANCE & LINEAGEIs Version ofA related resource of which the described resource is a version, edition, or adaptation.SHOULDMUSTWithout provenance and or date, data is not provably true. Any random data could be added without any indication of where it comes from. SHOULDMAYIf you only have one version of the resource should you still create an `is version of`attribute for that dataset?SHOULDSHOULDMAYMAYSHOULDif version is not managed properly, it may cause confusion.MUST
Makes clear if it is an updated version of a series of data.
SHOULDif version is not managed properly, it may cause confusion.MUST
Makes clear if it is an updated version of a series of data.
16
PROVENANCE & LINEAGEVersionThe version number of a resource.SHOULDMUSTWithout provenance and or date, data is not provably true. Any random data could be added without any indication of where it comes from. MUSTImportant for identification of what you have and for things like data citation and FAIR dataSHOULDSHOULDMAY
Only useful if a form is given. SemVer?
MUSTMUSTSHOULDConsider specifying further (i.e. requiring semantic versioning) but also allow for cases where the resource is dynamic and versioning is not meaningulSHOULDMUST
Helps to know the previous versions.
SHOULDMUST
Helps to know the previous versions.
MUST
17
PROVENANCE & LINEAGEVersion NotesA description of changes between this version and the previous version of the resource.MAYSHOULDWithout provenance and or date, data is not provably true. Any random data could be added without any indication of where it comes from. SHOULDNeed to know what has changed, especially for new dataMAYSHOULDSHOULDMUSTMUSTSHOULDSHOULDSHOULDSHOULDMAYif any substantive change - if not - a presumption that it is similar data but more of it
18
PROVENANCE & LINEAGEPrevious VersionThe previous version of a resource in a lineage.MAYSHOULDWithout provenance, data is not provably true. Any random data could be added without any indication of where it comes from. MAYSHOULDSHOULDMAYSHOULDSHOULDSHOULDgood to have some version history recordsSHOULDSHOULDgood to have some version history recordsSHOULDMAY
19
PROVENANCE & LINEAGEIs Replaced ByA related resource that supplants, displaces, or supersedes the described resource.MAYMUSTThe amount of time wasted by not knowing a dataset is superceded is highly problematic. Acknowledge this is difficult to achieve.MUSTMAYMAYSHOULD
Should if versioning at all.
MAYSHOULDSHOULDSHOULDSHOULDSHOULDMAY
20
PROVENANCE & LINEAGEWas derived fromA derivation is a transformation of an entity into another, an update of an entity resulting in a new one, or the construction of a new entity based on a pre-existing entity.MAYMUSTWithout provenance and or date, data is not provably true. Any random data could be added without any indication of where it comes from. SHOULDShould be known as part of the process, so for new data this should be thereSHOULDWill require N:1 mappingsMAYCould get messy when a dataset is derived from many datasetsMAYMAYMAYSHOULDMAYSHOULDMAY
21
TEMPORALTemporal CoverageTemporal characteristics of the resource.SHOULDSHOULDSHOULDSHOULDSHOULDSHOULDMust if temporally bounded.MAYSHOULDSHOULDSHOULDSHOULDSHOULDSHOULDneeds to have information on what the timestamp relates to (end of period, start of period, mid period) and operational data needs to be in ISO 8601
22
TEMPORALDate CreatedDate of creation of the resource.MUSTMUSTMUSTMUSTMUSTMUSTIf a fixed dataset (e.g. document). Not relevant if a stream.MUSTMUSTSHOULDResource creation time is tricky for some kinds of resource, I'd rather see this as 'SHOULD' in cases where it's applicable rather than forcing providers to populate a field with ambiguous semanticsMUSTMUSTMUSTMUST
23
TEMPORALDate ModifiedMost recent date on which the catalog entry was changed, updated or modified. Publish date according to ISO8601 standard.SHOULDMUSTWithout provenance and or date, data is not provably true. Any random data could be added without any indication of where it comes from. MUSTNeed to understand what you are looking at especially if the versioning isn't very informativeMUSTSHOULDMUSTDittoMUSTMUSTSHOULDSHOULDSHOULDSHOULDSHOULD
24
TEMPORALAccrual PeriodicityThe frequency with which items are added to a collection.MAYSHOULDCritical quality inidicatorMAYSHOULDSHOULDMAYSHOULDSHOULDMAYSHOULD
It helps to know how often a resource is updated or re-released.
MAYSHOULD
It helps to know how often a resource is updated or re-released.
25
TEMPORALDate issuedDate of formal issuance (e.g., publication) of the item. Publish date according to ISO8601 standard.MUSTMUSTMUSTSHOULDMUSTMAYNot relevant to a stream. So context-dependent MUST.MUSTMUSTSHOULDSimilar to 'Date Created', there are kinds of data resources which don't fall into this modelSHOULDMUSTSHOULDMUST
26
TEMPORALTemporal ResolutionMinimum time period resolvable in the dataset. This is intended to provide a summary indication of the temporal resolution of the data distribution as a single value.MAYMAYMAYSHOULDSHOULDMUSTMust if temporally divided - again contextual.MAYMAYMAYSHOULD
It is very helpful to quickly know the granularity of time series data, particularly for the monitoring of networks (hourly, half-hourly etc.).
MAYSHOULD
It is very helpful to quickly know the granularity of time series data, particularly for the monitoring of networks (hourly, half-hourly etc.).
27
TEMPORALDate ValidDate and often a date range of validity of a resource. Publish date according to ISO8601 standard.SHOULDSHOULDSHOULDMAYMAYSHOULDSHOULDSHOULDSHOULDbut it may be difficult to maintain, and keep it up-to-dateSHOULDSHOULDbut it may be difficult to maintain, and keep it up-to-dateSHOULD
28
FORMAT & SCHEMAFormatDescribe the file format, physical medium, or dimensions of the resource.MUSTMUSTMUSTMUSTMAYMUST
Format(s) plural if this is an API - would expect content negotiation to provide, e.g., JSON, XML, potentially CSV, depending on the client request.
MUSTMUSTSHOULDThis doesn't feel like something that should be mandated. A lot of these terms only really apply to resources which have a 'downloadable' natureMUSTMUSTMUSTMUST
29
FORMAT & SCHEMAConforms ToAn established standard to which the described resource conforms.MAYSHOULDCritical quality inidicatorMAYSHOULDSHOULDMAY
Must if it does conform.
MAYMAYMAYSHOULD
Is helpful to know what kind of standards the data conforms to, such as ISO8601 for time series.
MAYSHOULD
Is helpful to know what kind of standards the data conforms to, such as ISO8601 for time series.
30
LICENSING & RIGHTSLicenseA legal document giving official permission to do something with the resource.SHOULDMUSTNo licence means the data is, essentially, useless for any commercial use.SHOULDThis may be difficult retrospectively but all new datasets MUST have one!MUSTMUST
Use SPDX License identifier
SHOULDSHOULDSHOULDMUSTAmbiguous licensing precludes almost all potential use of a data setMUSTThis is a must, otherwise, people don't know what they are allowed to doSHOULDMUSTThis is a must, otherwise, people don't know what they are allowed to doSHOULD
31
LICENSING & RIGHTSRightsInformation about rights held in and over the resource.SHOULDMUSTNo licence means the data is, essentially, useless for any commercial use.SHOULDSHOULDSHOULDSHOULDSHOULDMUSTSHOULDMUSTSHOULD
32
SPATIALSpatial CoverageSpatial characteristics of the resource.SHOULDSHOULDSHOULDSHOULDSHOULDSHOULD
Must if this is relevant. Spatial dataset without these characteristics of limited discovery value.
SHOULDSHOULDSHOULDSHOULDSHOULDSHOULD
33
SPATIALSpatial Resolution In MetersMinimum spatial separation resolvable in a dataset, measured in meters.MAYMAYMAYSHOULDSHOULDFew energy datasets are dense, gridded datasets (that is, few energy datasets are on a regular geospatial grid.). Instead, resolution in terms of whether the data is from individual energy resources, or spatially aggregated in some way, would be usefulMAYMAYMAYMAYnot necessarily in meter, can be in other measures (eg. address level, postcode level, LSOA level)SHOULD
Can include other geographies like postcode level, LSOA level etc.
MAYnot necessarily in meter, can be in other measures (eg. address level, postcode level, LSOA level)SHOULD
Can include other geographies like postcode level, LSOA level etc.
34
SPATIALGeometryAssociates any resource with the corresponding geometry. The locn:Geometry class provides the means to identify a location as a point, line, polygon, etc. expressed using coordinates in some coordinate reference system.MAYMAYMAYMAYMAYMAYMAYMAYMAYgood to haveMAYMAYgood to haveMAY
35
SPATIALIs defined by [UK Administrative Regions, Postcodes etc]rdfs:isDefinedBy is an instance of rdf:Property that is used to indicate a resource defining the subject resource. This property may be used to indicate an RDF vocabulary in which a resource is described.MAYMAYMAYMAYMAYMAYMAYSHOULDMAYSHOULDMAY
36
QUALITYQuality annotationRepresents quality annotations, including ratings, quality certificates or feedback that can be associated to datasets or distributions. Quality annotations must have one oa:motivatedBy statement with an instance of oa:Motivation (and skos:Concept) that reflects a quality assessment purpose. We define this instance as dqv:qualityAssessment.MAYSHOULDIf quality data is the key aim of this project. Perhaps insisting that this is addressed is important?MAYI don't like the term "quality" as this is very subjective - I would personally call it "additional information" as it is also less emotive! People aren't always willing to identify known issues if it is perceived in a negative light!SHOULDEven a short description of issues in the dataset can save a lot of work, ideally there should be a standard template for describing these issuesMAYI continue to be nervous about single "scores" for quality for energy datasets. Dataset A might be awesome for use-case 1, but awful for use-case 2. There's no single quality metric that satisfies all possible use-cases.MAYMAYMAYMAYMAYMAYMAY
37
ACCESSAccess URLA URL of the resource that gives access to a distribution of the dataset. E.g. landing page, feed, SPARQL endpoint.MUSTMUSTMUSTMUSTMUSTThis could also be a public cloud bucket URL, e.g. "gs://nowcasting-data" for Google CloudMUSTMUSTMUSTSHOULDThis may either not exist or may be provided directly rather than by external reference. It should probably be a requirement to do either a reference or inline definition, but requiring an access URL as an item in of itself may not always be appropriateSHOULDif there is such urlMUSTSHOULDif there is such urlMUSTMUST
38
ACCESSDownload URLThe URL of the downloadable file in a given format. E.g. CSV file or RDF file. The format is indicated by the distribution’s dcterms”format and/or dcat:mediaType.SHOULDSHOULDSHOULDSHOULDSHOULDSHOULDMAYSHOULDSHOULDMUST
Helps get the download and improves user experience if the link is not immediately obvious from the access url.
SHOULDMUST
Helps get the download and improves user experience if the link is not immediately obvious from the access url.
39
HAVE ANY MORE? ADD THEM BELOW
40
Type of datasetTimeseries? Map? Survey of regulations? List of energy assets? (suggested by Jack)Agree stronglyAgee - controlled vocab needed for interoperabilitySHOULDWould be useful to have classes of dataset, e.g. tabular/APIshouldAgree.Agree.agree
41
Energy resourceSolar / wind / CCGT / etc. (suggested by Jack)Agree stronglyHOw is this different from Subject/Keyword? SHOULDperhaps simplest to link on fuel rather than technologyshouldAren't these covered by Subject?agree
42
Which physical quantities does this dataset record?Power? Temperature? etc. Agree stronglyMay be more difficult to know, or may be very complicated to explainSHOULDAlready a lot of existing work here from the OEO based on the UNITs ontology
43
UnitskW? Kelvin? Volts?agree
44
Cost of the datasetConditions for the dataset is free/ free at the point of use; and other situations that data need to be paid to useAgree stronglyAgreeCovered by licensing?ShouldBut probably will be covered in the landing page of the dataset/ licencing.
Agree strongly. Sometimes this is hard to find out.
ShouldBut probably will be covered in the landing page of the dataset/ licencing.
Agree strongly. Sometimes this is hard to find out.
45
Linked/reference datasetrelated dataset that will likely to be needed when interpreting the focusing datasetSo like calibration datasets?ShouldShould
46
Size on diskThe size of the dataset (after compression), e.g. in GBytesAgreeagreeSHOULDAgreeAgree
47
Size (before compression)For dense N-dimensional arrays, give the size of each dimension. e.g. number of rowsAgree
48
Historical dataset? Or continually updated? Or bothMaybe should have two metadata entries: one for the historical dataset; another for the live dataAgreeThere are lots of differences in metadata requirement from timeseries (continuous) and one off data sets,but this distinction be covered under the type description?
would agree this could be a helpful way of splitting datasets into some that are updated on regular basis (perhaps with changes in historical data) and those that are snapshots - i.e. may contain errors that are subsequently updated by the ongoing updated data)
49
Number of energy assets described by the dataset1 solar system? 1,000,000 solar systems?!
50
Naming convention for energy assetsMPAN? REPD?Agree
51
Any privacy concerns related to datasete.g. very specific locational dataOr whether there is any sort of Ethics process? I suppose perhaps the Access area could have some metadata field about whether it is open or if there is a process to follow? With this as controlled vocab
52
Data processingIs this raw data? Or aggregated in some way? Or cleaned? Can we link to the Python script used to process the data?MAY
53
Embargo / (c) expiry?can (c) data transition to (cc) data after a period N? i.e. after which point in time does the data become CC-BY-4.0 (this would have to be tied to a version)Agree stronglyAgreeMAY
54
55
The meaning of each column in the datasete.g. "col1 = Power in kW; col2 = UTC timestamp in ISO 8601 format"Context is so important, but need to be open to how this is achievedHandled within MEDA by recommending CSVW (JSON-LD representation)
Strongly agree. Units should also be obvious.
Strongly agree. Units should also be obvious.
56
Who funded the collection of the data?Might already be covered by Creator, but could be good for transparency.
57
Could be interesting to have a section on uncertaintymight be covered under the description
58
Temporal aggregationAt least "period-ending" or "period-beginning". Even better: "Each row represents the mean of the previous hour of data"
59
Meaning of polarityWhat does a negative powerflow mean? e.g. -ve means power flowing from grid to building; +ve means power flowing from building to gridstrongly agree
60
would be great to also have a below the line comments section for each datasetprovides a community way of allowing issues with the data set to be pointed out
61
Value used to identify null datae.g. "NaN", " ", "?", "-"Should be able to specify when there are multiple null types, ideally should also be able to describe these at a column levelstrongly agree
62
Access methodHow the data is accessible eg by a URL/by following a registration process/not open/it's so big you need to SFTP it!I think it would be helpful to be clear about how data can be accessed, rather than assuming it will be a URL
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80