Skip to main content

Table 1 Number of distinct values per variable overall, for only breast, only colorectal, and only prostate tumor records. Each tumor record contains 16 categorical variables describing the disease, the patient, and the diagnostic procedure. The variables are sorted by the decreasing number of distinct values overall

From: Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries

Variable

Explanation

Example

Number of distinct values

   

Overall

Breast

Colorectal

Prostate

ICD-O morphology

Cell type and behavior of tumor

8140/3

127

91

57

33

TNM T

Spread of tumor

4

74

59

33

28

TNM N

Spread of lymph node metastases

3

51

42

25

8

ICD-10 code

Classification of disease

C50.9

30

13

15

2

ICD-O topography

Location of tumor

C50.9

23

9

12

2

TNM M

Presence of remote metastases

1

18

9

15

13

Metastasis

Location of remote metastases

PUL

12

11

11

10

Grading

Amount of abnormality of tumor

2

11

11

10

11

Diagnosis assurance

Diagnostic method

6

7

7

6

7

Lateral localization

Side location of tumor

L

6

5

6

6

c/p-prefix N

Diagnostic method for TNM N

C

5

5

3

3

Diagnosis age (binned)

Age at diagnosis

[59,65)

5

5

5

5

Age at death (binned)

Age at death (if patient has died)

[66,70)

5

5

5

5

Sex

Sex of patient

W

3

3

3

3

c/p-prefix T

Diagnostic method for TNM T

C

3

3

3

3

c/p-prefix M

Diagnostic method for TNM M

C

3

3

3

3