Skip to main content

Table 5 10 fold cross-validation overall results using the VHA evaluation corpus for exact, partial and fully-contained matches with one  PHI  category, and with each PHI types separately

From: Evaluating current automatic de-identification methods with Veteran’s health administration clinical documents

10-fold cross-validation experiment

Overall results

EXACT MATCHES

PARTIAL MATCHES

FULLY-CONTAINED MATCHES

  

MIST

HIDE

MIST

HIDE

MIST

HIDE

One PHI

P (CI)

0.89

0.88

0.96

0.95

0.91

0.91

  

(0.88-0.90)

(0.87-0.89)

(0.95-0.97)

(0.94-0.96)

(0.90-0.92)

(0.90-0.92)

 

R (CI)

0.64

0.70

0.70

0.76

0.67

0.73

  

(0.625-0.655)

(0.685-0.715)

(0.685-0.715)

(0.75-0.77)

(0.655-0.685)

(0.72-0.74)

 

F2 (CI)

0.68

0.73

0.74

0.79

0.71

0.76

  

(0.665-0.695)

(0.72-0.74)

(0.725-0.755)

(0.775-0.805)

(0.70-0.72)

(0.75-0.77)

All PHI types

P (CI)

0.87

0.87

0.95

0.92

0.90

0.89

  

(0.855-0.885)

(0.86-0.88)

(0.94-0.96)

(0.905-0.935)

(0.885-0.915)

(0.88-0.90)

 

R (CI)

0.63

0.69

0.69

0.74

0.66

0.71

  

(0.615-0.655)

(0.675-0.705)

(0.675-0.705)

(0.725-0.755)

(0.645-0.675)

(0.695-0.725)

 

F2 (CI)

0.67

0.72

0.73

0.77

0.70

0.74

  

(0.655-0.685)

(0.71-0.73)

(0.713-0.745)

(0.76-0.78)

(0.685-0.715)

(0.725-0.755)

  1. CI: Confidence Interval obtained with a confidence level of 95%.
  2. One PHI = one overall PHI category considered.
  3. All PHI types = each PHI type evaluated separately.
  4. P = Precision; R = Recall; F 2  = F2-measure.