Skip to main content

Table 3 Number of selected and implausible records overall and per tumor localization in each sample. FindFPOF and the autoencoder had a higher precision overall and for each tumor localization than the baseline. In the random sample, \(8\%\) of all records and \(2\%\) of the breast records were implausible. For the autoencoder sample, \(28\%\) of all records and \(10\%\) of the breast records were implausible. FindFPOF and the autoencoder selected more records from those localizations that had a higher percentage of implausible records in the random sample. For the random sample, \(18\%\) of the colorectal records were implausible, while only \(2\%\) of the breast records were implausible. Thus, the autoencoder and FindFPOF returned more colorectal records (approximately two-thirds of the records are colorectal) that had a higher percentage of implausible records (approximately one-third). In contrast, the samples returned by the autoencoder and FindFPOF contain a lower percentage of breast records (\(28\%\) and \(14\%\), respectively) than both the random sample and the full dataset (\(54\%\) and \(58\%\), respectively). For each sample, the tumor localizations with the highest number of selected and implausible records are highlighted

From: Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries

 

Records

 

Tumor localization

   

All

Breast

Colorectal

Prostate

Full dataset

All

n \(\left( \frac{n}{n_{all}} \right)\)

21,104 (100%)

11,573 (54%)

6995 (34%)

2536 (12%)

Random sample

Selected

n \(\left( \frac{n}{n_{all}} \right)\)

300 (100%)

172 (58%)

87 (28%)

41 (14%)

 

Implausible

\(\#impl\) \(\left( \text{ precision: } \frac{\#impl}{n}\right)\)

23 (8%)

4 (2%)

16 (18%)

3 (8%)

Autoencoder sample

Selected

n \(\left( \frac{n}{n_{all}}\right)\)

300 (100%)

85 (28%)

193 (64%)

22 (8%)

 

Implausible

\(\#impl\) \(\left( \text{ precision: } \frac{\#impl}{n} \right)\)

83 (28%)

9 (10%)

67 (34%)

7 (32%)

FindFPOF sample

Selected

n \(\left( \frac{n}{n_{all}}\right)\)

300 (100%)

40 (14%)

200 (66%)

60 (20%)

 

Implausible

\(\#impl\) \(\left( \text{ precision: } \frac{\#impl}{n} \right)\)

83 (28%)

3 (8%)

65 (32%)

15 (24%)

All samples

Selected

Total (different)

900 (785)

297 (266)

480 (406)

123 (113)

 

Implausible

Total (different)

189 (157)

16 (14)

148 (124)

25 (19)