Skip to main content

Table 1 Data Quality Dimensions and Domains

From: Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R

Name Dimension Domain Definition Primary reference objects to detect data quality issues Primary reporting metrics of indicators
Integrity The degree to which the data conforms to structural and technical requirements.   
 Structural data set error The observed structure of a data set differs from the expected structure. Data elements, data records N
 Relational data set error The observed correspondence between different data sets differs from the expected correspondence. Data sets N
 Value format error The technical representation of data values within a data set does not conform to the expected representation. Data fields N, %
Completeness The degree to which expected data values are present.   
 Crude missingness Metrics of missing data values that ignore the underlying reasons for missing data. Data fields N,%
 Qualified missingness Metrics of missing data values that use reasons underlying missing data. Data fields, data elements, data record N,%
Consistency Consistency   
 Range and value violations Observed data values do not comply with admissible data values or value ranges. Data fields N,%
 Contradictions Observed data values appear in impossible or improbable combinations. Data fields N,%
Accuracy The degree of agreement between observed and expected distributions and associations.   
 Unexpected distributions Observed distributional characteristics differ from expected distributional characteristics. Data elements, data records Diverse statistical measuresa
 Unexpected associations Observed associations differ from expected associations. Data elements, data records Diverse statistical measuresa
 Disagreement of repeated measurements Disagreement between repeated measurements of the same or similar objects under specified conditions. Data elements, data records Diverse statistical measuresa
  1. N: number of issues; %: the percentage of issues relative to the number of assessed elements in a data structure
  2. a A wide range of statistical metrics may apply such as location, scale or shape parameters, correlation coefficients, measures of agreement