Guidelines on Data Quality Metrics

The ubiquitous nature of the Internet has resulted in a massive increase in the number of digital users and activity. This increase is, in turn, generating an unprecedented amount of data. The ability to easily capture and store this data has led to it playing an increasingly significant economic role and being used to drive key decisions in many companies. Corporations now rely on the use of high quality data for analytics to reveal opportunities for cost reduction, optimize investments, and to increase revenue. At the same time, data owners across sectors are beginning to view their datasets as not only fundamentally valuable, but also economically viable to distribute, resulting in a growing marketplace for data.

While it is relatively easy to amass large amounts of data, sorting, tagging, and presenting data in a way that enables users to glean valuable information is not as straightforward. There can be real variation in the quality of datasets available for purchase. Without a set of common metrics to compare the relative quality of datasets, potential buyers face challenges, especially when choosing amongst similar datasets from multiple data providers.

This set of guidelines recommends a baseline set of data quality metrics which is industry domain agnostic, for adoption by data providers. In addition to describing the methodology for deriving this set of metrics, tools for relaying metrics to end-users are also considered. Having a common set of metrics allows users to more easily compare the quality of different datasets, and match their expectations against available datasets.

Fill in the form below for link to the full document.

    This is a required question
    This is a required question
    This is a required question
    This is a required question
    This is a required question
    This is a required question