Skip to main content

Table 1 Description of original and simulated datasets

From: Evaluating bias due to data linkage error in electronic healthcare records

Dataset

Error distribution

Match rate

Error rate

Original data (PICANet-LabBase2)

Error varied by hospital

Matches: 1496/20924 (7%)

0-5% error,

Non-matches: 19431/20924 (93%)

<1% missing values

Simulated datasets

   

1

Random identifier error

  

2

Non-random error (associated with hospital)

Matches: 1000/10000 (10%)

5% error,

Non-random error (associated with outcome)

Non-matches: 9000/10000 (90%)

5% missing values

3

4

Random identifier error

  

5

Non-random error (associated with hospital)

Matches: 5000/10000 (50%)

5% error,

Non-random error (associated with outcome)

Non-matches: 5000/10000 (50%)

5% missing values

6

7

Random identifier error

  

8

Non-random error (associated with hospital)

Matches: 7000/10000 (70%)

5% error,

9

Non-random error (associated with outcome)

Non-matches: 3000/10000 (30%)

5% missing values

10

Random identifier error

  

11

Non-random error (associated with hospital)

Matches: 1000/10000 (10%)

10% error,

12

Non-random error (associated with outcome)

Non-matches: 9000/10000 (90%)

10% missing values

  1. All data were linked using both highest-weight classification and PII.