Skip to main content

Table 4 Number of distinct values per variable for each sample with implausible records. The number of distinct values per variable is shown for the autoencoder (AE), FindFPOF (FF), and the random selection (RS), as well as for all tumor localizations studied (overall), and only for breast, colorectal, and prostate tumors. The implausible records identified by the autoencoder are more diverse than those identified by FindFPOF, overall, for breast tumors, and colorectal tumors. For prostate, the implausible records of the FindFPOF sample are more diverse. The randomly selected implausible records are much more homogeneous than the implausible records of the other two samples. We highlight the highest number of distinct values. The variables are sorted by the decreasing number of distinct values in the complete random sample

From: Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries

Variable

Number of distinct values

 

Overall

Breast

Colorectal

Prostate

 

AE

FF

RS

AE

FF

RS

AE

FF

RS

AE

FF

RS

ICD-10 code

21

15

11

6

1

2

13

12

8

2

2

1

TNM T

12

9

10

6

0

2

10

6

6

6

6

3

ICD-O topography

17

13

9

5

1

2

11

11

6

1

1

1

Grading

9

7

7

6

2

2

8

7

6

4

4

2

ICD-O morphology

23

19

6

8

2

2

17

17

4

4

5

1

TNM N

9

5

5

4

0

2

8

4

5

4

4

2

Diagnosis age (binned)

5

5

5

3

3

3

5

5

4

2

4

2

Age at death (binned)

5

5

5

2

2

1

5

5

4

2

4

1

Lateral localization

5

4

4

3

1

3

5

4

4

3

3

1

TNM M

10

7

4

2

1

2

7

5

3

5

5

2

Metastasis

7

7

4

4

3

2

5

5

2

2

4

2

Diagnosis assurance

7

6

3

4

2

2

5

5

2

4

4

1

c/p-prefix T

3

3

3

2

0

2

3

3

3

2

2

2

c/p-prefix N

3

3

3

2

0

2

3

3

3

2

2

2

c/p-prefix M

3

2

3

2

1

1

3

2

3

2

2

2

Sex

2

2

2

1

1

1

2

2

2

1

1

1