Skip to main content

Advertisement

Table 1 Description of original and simulated datasets

From: Evaluating bias due to data linkage error in electronic healthcare records

Dataset Error distribution Match rate Error rate
Original data (PICANet-LabBase2) Error varied by hospital Matches: 1496/20924 (7%) 0-5% error,
Non-matches: 19431/20924 (93%) <1% missing values
Simulated datasets    
1 Random identifier error   
2 Non-random error (associated with hospital) Matches: 1000/10000 (10%) 5% error,
Non-random error (associated with outcome) Non-matches: 9000/10000 (90%) 5% missing values
3
4 Random identifier error   
5 Non-random error (associated with hospital) Matches: 5000/10000 (50%) 5% error,
Non-random error (associated with outcome) Non-matches: 5000/10000 (50%) 5% missing values
6
7 Random identifier error   
8 Non-random error (associated with hospital) Matches: 7000/10000 (70%) 5% error,
9 Non-random error (associated with outcome) Non-matches: 3000/10000 (30%) 5% missing values
10 Random identifier error   
11 Non-random error (associated with hospital) Matches: 1000/10000 (10%) 10% error,
12 Non-random error (associated with outcome) Non-matches: 9000/10000 (90%) 10% missing values
  1. All data were linked using both highest-weight classification and PII.