Metadata
 Share
The version of the browser you are using is no longer supported. Please upgrade to a supported browser.Dismiss

View only
 
 
ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAEAFAG
1
ItemConceptGuidanceField nameRequired?Controlled vocabulary? If so, what are the termsShould this be a facet?Where should this be located?Notes
2
Step 1Data provision
3
1aPopulation included in the data setData providers should give details of the population included in the data set (e.g. everyone registered with a GP), the geographic coverage of the data (e.g. England and Wales), the number of records in each source data set and how any ‘opt-outs’ were dealt withthese are part of ICPSRs regular metadata collection
4
1bLinkability of the data set Details should be shared about how the data were generated (e.g. face-to-face), processed (e.g. a self-entered form or entered by an administrator) and quality controlled (e.g. manually checked), including how identifying characteristics were
5
1b(i)1b(i) – Collected and allocated
6
1b(ii)1b(ii) – Updated as further personal data were collected, and dates of most recent updates
7
1b(iii)1b(iii) – Checked and cleaned, including any validation rules
8
1b(iv)1b(iv) – Replaced with artificial identifiers to reduce disclosure before being released for linkage
9
Step 2Data linkage
10
2aDescriptions of linkage processesData linkers should provide descriptions of how the linkage was done including:
11
2a(i)2a(i) – A clear description of the data sources and identifying characteristics used for linkage, details of how identifiers were cleaned and validated before linkage, patterns of missingness, the expected range of values after cleaning, and how any de-duplication was performed.
12
2a(ii)2a(ii) – Details of any transformation or replacement with artificial identifiers before linkage
13
2a(iii) 2a(iii) – A detailed description of the method (or algorithm) used for linkage, whether it was rule-based (e.g. deterministic) or score-based (e.g. probabilistic linkage), and how multiple linkages were handledyesyes, as far as Prob or deter, but also text?yes
14
2a(iv)2a(iv) – A detailed description of any new derived variables that were introduced during the linkage process (e.g. confidence level or probability of linkage or link score)
15
2a(v)2a(v) – Details of any blocking or grouping methods used for score-based linkage and how match scores were derivedyes
16
2bRecord-level indicators of the linkage processData linkers should provide analysts with record-level indicators of the data linkage process to enable adjustments for linkage error in the analyses. Indicators could include the pass-ID (the step in a rule-based linkage process when a pair of records linked), or match scores (e.g. match weights used in probabilistic linkage)
17
2cAggregate linkage results
Data linkers should make available descriptions, tables and flow diagrams depicting linkage accuracy for each linkage undertaken. These should include:
18
2c(i) 2c(i) – A description of the number of records that were linked and unlinked in each of the source files
19
2c(ii)2c(ii) – A table comparing the aggregate characteristics of individuals in the linked and unlinked records for each source data set (defined by the analyst in agreement with the data linker)
20
2c(iii)2c(iii) – A description of the ‘representativeness’ of the linked data set to each source data set, for example, including weights that can be applied to allow grossing up the linked data set to better represent the source data sets
21
2c(iv)2c(iv) – A flow diagram to represent the steps in linkage and numbers involved at each step
22
2dGeneric reports of linkage accuracyThe data linker should report generic information about the quality of linkage carried out. This should include:
23
2d(i)2d(i) – Estimates of linkage error rates based on regular quality monitoring of linkage accuracy. For example, measures of the sensitivity and specificity for the algorithm used
24
2d(ii)2d(ii) – Details of how error rates were estimated, for example, by comparing linked records with a reference data set
25
2eDescriptions of disclosure controlsData linkers should describe any statistical disclosure controls used to reduce identifiability of linked data prior to release to data analysts
26
2fOverview of data linkageData linkers should establish systems to improve the quality of linkage studies, for example, by publishing a database detailing the data linkages undertaken with links to publications.The advisory and approvals structure for data linkage should include experts who can scrutinize the impact of linkage processes on results of analyses
27
Step 3Data analysesData analysts should assess and report on the quality of the linked data used for analyses
28
3aAccount for linkage errorAnalysts should report how analyses took into account linkage error, including:
29
3a(i)– How record-level indicators of the linkage process or aggregate measures reflecting linkage quality were used for adjustments, including underlying assumptions and methods used
30
3a(ii)– Uncertainty analyses of the effects of linkage errors
31
3a(iii) – Sensitivity analyses to determine the impact of assumptions used in the analyses
32
Step 4Reporting study findingsReports of linkage studies should, where possible, include items in Steps 1–3, building on the RECORD statement for research reports
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Loading...
Main menu