From: Evaluating bias due to data linkage error in electronic healthcare records
Dataset | Error distribution | Match rate | Error rate |
---|---|---|---|
Original data (PICANet-LabBase2) | Error varied by hospital | Matches: 1496/20924 (7%) | 0-5% error, |
Non-matches: 19431/20924 (93%) | <1% missing values | ||
Simulated datasets | Â | Â | Â |
1 | Random identifier error | Â | Â |
2 | Non-random error (associated with hospital) | Matches: 1000/10000 (10%) | 5% error, |
Non-random error (associated with outcome) | Non-matches: 9000/10000 (90%) | 5% missing values | |
3 | |||
4 | Random identifier error | Â | Â |
5 | Non-random error (associated with hospital) | Matches: 5000/10000 (50%) | 5% error, |
Non-random error (associated with outcome) | Non-matches: 5000/10000 (50%) | 5% missing values | |
6 | |||
7 | Random identifier error | Â | Â |
8 | Non-random error (associated with hospital) | Matches: 7000/10000 (70%) | 5% error, |
9 | Non-random error (associated with outcome) | Non-matches: 3000/10000 (30%) | 5% missing values |
10 | Random identifier error | Â | Â |
11 | Non-random error (associated with hospital) | Matches: 1000/10000 (10%) | 10% error, |
12 | Non-random error (associated with outcome) | Non-matches: 9000/10000 (90%) | 10% missing values |