Skip to main content

Table 2 Overview on Data Quality Indicators with Definitions

From: Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R

ID

Name of indicator

Definition

Integrity

 DQI-1001

Unexpected data elements

The observed set of available data elements does not match the expected set.

 DQI-1002

Unexpected data records

The observed set of available data records does not match the expected set.

 DQI-1003

Duplicates

The same data elements or data records appear multiple times.

 DQI-1004

Data record mismatch

Data records from different data sets do not match as expected.

 DQI-1005

Data element mismatch

Data elements from different data sets do not match as expected.

 DQI-1006

Data type mismatch

The observed data type does not match the expected data type.

 DQI-1007

Inhomogeneous value formats

The observed data values have inhomogeneous format across different data fields.

 DQI-1008

Uncertain missingness status

System indicated missing values (e.g. NA/./Null …) appear where a qualified missing code is expected.

Completeness

 DQI-2001

Missing values

Data fields without a measurement value.

 DQI-2002

Non-response rate

The proportion of eligible observational units for which no information could be obtained.

 DQI-2003

Refusal rate

The proportion of eligible individuals who refuse to give the information sought.

 DQI-2004

Drop-out rate

The proportion of all participants who only partially complete the study and prematurely abandon it.

 DQI-2005

Missing due to specified reason

Information in a data collection that is missing due to a specified reason.

Consistency

 DQI-3001

Inadmissible numerical values

Observed numerical data values are not admissible according to the allowed ranges.

 DQI-3002

Inadmissible time-date values

Observed time-date values are not admissible according to the allowed time and date ranges.

 DQI-3003

Inadmissible categorical values

Observed categorical data values are not admissible according to the allowed categories.

 DQI-3004

Inadmissible standardized vocabulary

Data values are not admissible according to the reference vocabulary.

 DQI-3005

Inadmissible precision

The precision of observed numerical data values does not match the expected precision.

 DQI-3006

Uncertain numerical values

Observed numerical values are uncertain or improbable because they are outside the expected ranges.

 DQI-3007

Uncertain time-date values

Observed time-date values are uncertain or improbable because they are outside the expected ranges.

 DQI-3008

Logical contradictions

Different data values appear in logically impossible combinations.

 DQI-3009

Empirical contradictions

Different data values appear in combinations deemed impossible based on empirical reasoning.

Accuracy

 DQI-4001

Univariate outliers

Numerical data values deviate markedly from others in a univariate analysis.

 DQI-4002

Multivariate outliers

Numerical data values deviate markedly from others in a multivariate analysis.

 DQI-4003

Unexpected locations

Observed location parameters differ from expected location parameters.

 DQI-4004

Unexpected shape

The observed shape of a distribution differs from the expected shape.

 DQI-4005

Unexpected scale

Observed scale parameters differ from expected scale parameters.

 DQI-4006

Unexpected proportions

Observed proportions differ from expected proportions.

 DQI-4007

Unexpected association strength

The observed strength of an association deviates from the expected strength of the association.

 DQI-4008

Unexpected association direction

The observed direction of an association (e.g. negative, positive) deviates from the expected direction.

 DQI-4009

Unexpected association form

The observed form of an association (e.g. linear, quadratic, exponential...) deviates from the expected form.

 DQI-4010

Inter-Class reliability

Differences between classes (e.g. examiners) when measuring the same or similar objects under specified conditions.

 DQI-4011

Intra-Class reliability

Differences within classes (e.g. examiners) when measuring the same or similar objects under specified conditions.

 DQI-4012

Disagreement with gold standard

Differences with a gold standard when measuring the same or similar objects under specified conditions.

  1. The term “expected” refers to a test criterion as annotated in metadata fields