Skip to main content

Advertisement

Table 5 10 fold cross-validation overall results using the VHA evaluation corpus for exact, partial and fully-contained matches with one  PHI  category, and with each PHI types separately

From: Evaluating current automatic de-identification methods with Veteran’s health administration clinical documents

10-fold cross-validation experiment
Overall results EXACT MATCHES PARTIAL MATCHES FULLY-CONTAINED MATCHES
   MIST HIDE MIST HIDE MIST HIDE
One PHI P (CI) 0.89 0.88 0.96 0.95 0.91 0.91
   (0.88-0.90) (0.87-0.89) (0.95-0.97) (0.94-0.96) (0.90-0.92) (0.90-0.92)
  R (CI) 0.64 0.70 0.70 0.76 0.67 0.73
   (0.625-0.655) (0.685-0.715) (0.685-0.715) (0.75-0.77) (0.655-0.685) (0.72-0.74)
  F2 (CI) 0.68 0.73 0.74 0.79 0.71 0.76
   (0.665-0.695) (0.72-0.74) (0.725-0.755) (0.775-0.805) (0.70-0.72) (0.75-0.77)
All PHI types P (CI) 0.87 0.87 0.95 0.92 0.90 0.89
   (0.855-0.885) (0.86-0.88) (0.94-0.96) (0.905-0.935) (0.885-0.915) (0.88-0.90)
  R (CI) 0.63 0.69 0.69 0.74 0.66 0.71
   (0.615-0.655) (0.675-0.705) (0.675-0.705) (0.725-0.755) (0.645-0.675) (0.695-0.725)
  F2 (CI) 0.67 0.72 0.73 0.77 0.70 0.74
   (0.655-0.685) (0.71-0.73) (0.713-0.745) (0.76-0.78) (0.685-0.715) (0.725-0.755)
  1. CI: Confidence Interval obtained with a confidence level of 95%.
  2. One PHI = one overall PHI category considered.
  3. All PHI types = each PHI type evaluated separately.
  4. P = Precision; R = Recall; F 2  = F2-measure.