Skip to main content

Table 3 “Out-of-the-box” overall results for using the VHA evaluation corpus exact, partial and fully-contained matches with one  PHI  category, and with each PHI categories separately

From: Evaluating current automatic de-identification methods with Veteran’s health administration clinical documents

RULE-BASED SYSTEMS

Overall results

Overall results

PARTIAL MATCHES

FULLY-CONTAINED MATCHES

  

HMS Scrubber

MeDS

MIT deid

HMS Scrubber

MeDS

MIT deid

HMS Scrubber

MeDS

MIT deid

One PHI

P (CI)

0.01 (0.005-0.015)

0.10 (0.085-0.115)

0

0.32 (0.31-0.33)

0.45 (0.435-0.465)

0.81 (0.795-0.825)

0.16 (0.15-0.17)

0.14 (0.125-0.155)

0.42 (0.40-0.44)

 

R (CI)

0.02 (0.015-0.025)

0.21 (0.20-0.22)

0

0.65 (0.64-0.66)

0.78 (0.765-0.795)

0.64 (0.625-0.655)

0.34 (0.325-0.355)

0.32 (0.305-0.335)

0.36 (0.345-0.375)

 

F2 (CI)

0.02 (0.012-0.025)

0.17 (0.16-0.18)

0

0.54 (0.53-0.55)

0.68 (0.665-0.695)

0.67 (0.655-0.685)

0.28 (0.27-0.29)

0.25 (0.24-0.26)

0.37 (0.355-0.385)

All PHI types

P (CI)

0.01 (0.005-0.015)

0.05 (0.045-0.055)

0

0.23 (0.22-0.24)

0.34 (0.325-0.365)

0.76 (0.745-0.775)

0.12 (0.115-0.125)

0.10 (0.09-0.11)

0.40 (0.335-0.465)

 

R (CI)

0.02 (0.0195-0.0215)

0.14 (0.13-0.15)

0

0.47 (0.455-0.485)

0.60 (0.585-0.615)

0.60 (0.585-0.615)

0.26 (0.225-0.295)

0.22 (0.205-0.235)

0.34 (0.325-0.355)

 

F2 (CI)

0.02 (0.018-0.022)

0.10 (0.09-0.11)

0

0.39 (0.38-0.40)

0.52 (0.505-0.535)

0.63 (0.615-0.645)

0.21 (0.195-0.225)

0.18 (0.17-0.19)

0.35 (0.315-0.385)

MACHINE LEARNING-BASED SYSTEMS

Overall results

EXACT MATCHES

PARTIAL MATCHES

FULLY-CONTAINED MATCHES

   

MIST

HIDE

MIST

HIDE

MIST

HIDE

One PHI

P (CI)

0.54

0.50

0.95

0.89

0.58

0.56

   

(0.52-0.56)

(0.48-0.52)

(0.935-0.965)

(0.875-0.905)

(0.56-0.60)

(0.54-0.58)

 

R (CI)

0.25

0.27

0.46

0.49

0.28

0.30

   

(0.24-0.26)

(0.26-0.28)

(0.445-0.475)

(0.475-0.505)

(0.27-29)

(0.29-31)

 

F2 (CI)

0.28

0.30

0.51

0.54

0.31

0.33

   

(0.265-0.295)

(0.285-0.315)

(0.495-0.525)

(0.525-0.555)

(0.295-0.325)

(0.315-0.345)

All PHI types

P (CI)

0.52

0.48

0.90

0.84

0.55

0.52

   

(0.495-0.545)

(0.46-0.50)

(0.885-0.915)

(0.825-0.855)

(0.525-0.575)

(0.50-0.54)

 

R (CI)

0.24

0.25

0.44

0.46

0.27

0.28

   

(0.225-255)

(0.24-0.26)

(0.425-0.455)

(0.445-0.475)

(0.255-0.285)

(0.265-0.295)

 

F2 (CI)

0.27

0.28

0.49

0.50

0.30

0.31

   

(0.255-0.285)

(0.265-0.295)

(0.475-0.505)

(0.485-0.515)

(0.285-0.315)

(0.295-0.325)

  1. CI: Confidence Interval obtained with a confidence level of 95%.
  2. One PHI = one overall PHI category considered.
  3. All PHI types = each PHI type evaluated separately.
  4. P = Precision; R = Recall; F 2  = F2-measure.