ABCDEFGHIJKLMNOPQRSTUVWXY
1
Data on the Web Best Practices - Implementation Report
2
Name of reporter:
3
Contact info:
4
Date of submission:Evidences - An evidence can be a link to a data portal or to a specific dataset
5
Evidence 1 Evidence 2
6
BPHow to test?URL:URL:
7
If the evidence considers the use of metadata to describe a datasetpassfailpassfail
8
BP1Check if human readable metadata is available.
9
BP1Check if the metadata is available in a valid machine-readable format and without syntax error.
10
BP2Check if the metadata for the dataset itself includes the overall features of the dataset in a human-readable format.
11
BP2Check if the descriptive metadata is available in a valid machine-readable format.
12
BP3Check if the metadata for the dataset itself includes information about local parameters (i.e. data, time, number formats, and language) in a human-readable format.
13
BP3Check if the metadata with locale information is available in a valid machine-readable format and without syntax errors.
14
BP4Check if the structural metadata of the dataset is provided in a human-readable format.
15
BP4Check if the metadata of the distribution includes structural information about the dataset in a machine-readable format and without syntax errors.
16
BP5Check if the metadata for the dataset itself includes the data license information in a human-readable format.
17
BP5Check if a user agent can automatically detect /discover the data license of the dataset
18
BP6Check that the metadata for the dataset itself includes the provenance information about the dataset in a human-readable format.
19
BP6Check if a computer application can automatically process the provenance information about the dataset.
20
BP7Check that the metadata for the dataset itself includes quality information about the dataset.
21
BP7Check if a computer application can automatically process the quality information about the dataset.
22
BP8Check if the metadata for the dataset/distribution provides a unique version number or date in a human-readable format.
23
BP8Check if a computer application can automatically detect/discover the unique version number or date of a dataset or distribution.
24
If the dataset has more than one version then
25
BP9Check that a list of published versions is available as well as a change log describing precisely how each version differs from the previous one.
26
If identifiers are used for the dataset and/or within the dataset then
27
BP10Check that each dataset is identified using a URI that has been designed for persistence. Ideally the relevant Web site includes a description of the design scheme and a credible pledge of persistence should the publisher no longer be able to maintain the URI space themselves.
28
BP11Check that within the dataset, references to things that don't change or that change slowly, such as countries, regions, organizations and people, are referred to by URIs or by short identifiers that can be appended to a URI stub. Ideally the URIs should resolve, however, they have value as globally scoped variables whether they resolve or not.
29
BP12Check that each version of a dataset has its own URI, and that there is also a "latest version" URI.
30
If the dataset is available in a machine-readable format then
31
BP13Check if the data format conforms to a known machine-readable data format specification.
32
BP14Check if the complete dataset is available in more than one data format.
33
If vocabularies and/or code lists were used to create the dataset then
34
BP15Check that classes, properties, terms, elements or attributes used to represent a dataset do not replicate those defined by vocabularies used for other datasets.
35
BP15Check if the terms or codes in the vocabulary to be used are defined in a standards development organization such as IETF, OGC & W3C etc., or are published by a suitable authority, such as a government agency.
36
BP16This is almost always a matter of subjective judgement with no objective test. As a general guideline:
Are common vocabularies used such as Dublin Core and schema.org? Are simple facts stated simply and retrieved easily? For formal knowledge representation languages, applying an inference engine on top of the data that uses a given vocabulary does not produce too many statements that are unnecessary for target applications.
37
If the dataset is available for bulk download then
38
BP17Check if the full dataset can be retrieved with a single request
39
If the dataset is large then
40
BP18Check that the entire dataset can be recovered by making multiple requests that retrieve smaller units.
41
If content negotation is used to provide multiple representations for the same resource then
42
BP19Check the available representations of the resource and try to get them specifying the accepted content on the HTTP Request header.
43
If data is available in real time then
44
BP20To adequately test real time data access, data will need to be tracked from the time it is initially collected to the time it is published and accessed. [PROV-O] can be used to describe these activities. Caution should be used when analyzing real-time access for systems that consist of multiple computer systems. For example, tests that rely on wall clock time stamps may reflect inconsistencies between the individual computer systems as opposed to data publication time latency.
45
If data is provided up to date
46
BP21Check that the update frequency is stated and that the most recently published copy on the Web is no older than the date predicted by the stated update frequency.
47
If the dataset includes references to data that is no longer available or is not available to all users then
48
BP22Check that an explanation of what is missing and instructions for obtaining access (if possible) are given. Check if a legitimate http response code in the 400 or 500 range is returned when trying to get unavailable data.
49
If an API is used to provide dataset access then
50
BP23Check if a test client can simulate calls and the API returns the expected responses.
51
BP24Check that the service avoids using http as a tunnel for calls to custom methods, and check that URIs do not contain method names.
52
BP25Check that every call enabled by your API is described in your documentation. Make sure you provide details of what parameters are required or optional and what each call returns.
53
BP26Check the Time To First Successful Call (i.e. being capable of doing a successful request to the API within a few minutes will increase the chances that the developer will stick to your API).
54
BP26Release changes initially to a test version of your API before applying them to the production version. Invite developers to test their applications on the test version and provide feedback.
55
If the dataset was removed or archived then
56
BP27Check that dereferencing the URI of a dataset that is no longer available returns information about its current status and availability, using either a 410 or 303 Response Code as appropriate.
57
BP28It is impossible to determine what will be available in, say, 50 years' time. However, one can check that an archived dataset depends only on widely used external resources and vocabularies. Check that unique or lesser-used dependencies are preserved as part of the archive.
58
If you collect feedback about the dataset then
59
BP29Check that at least one feedback mechanism is provided and readily discoverable by data consumers.
60
BP30Check that any feedback given by data consumers for a specific dataset or distribution is publicly available.
61
Data enrichment
62
BP31Look for missing values in the dataset or additional fields likely to be needed by others. Check that any data added by inferential enrichment techniques is identified as such and that any replaced data is still available. Check that code used to enrich the data is available. Check whether the metadata being extracted is in accordance with human knowledge and readable by humans.
63
BP32Check that the dataset is accompanied by some additional interpretive content that can be perceived without downloading the data or invoking an API.
64
If an existing dataset was used to create the dataset then
65
BP33Check that you have a record of at least one communication informing the publisher of your use of the data.
66
BP34Read through the original license and check that your use of the data does not violate any of the terms.
67
BP35Check that the original source of any reused data is cited in the metadata provided. Check that a human-readable citation is readily visible in any user interface.
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100