Skip to main content

Advertisement

Table 3 “Out-of-the-box” overall results for using the VHA evaluation corpus exact, partial and fully-contained matches with one  PHI  category, and with each PHI categories separately

From: Evaluating current automatic de-identification methods with Veteran’s health administration clinical documents

RULE-BASED SYSTEMS
Overall results Overall results PARTIAL MATCHES FULLY-CONTAINED MATCHES
   HMS Scrubber MeDS MIT deid HMS Scrubber MeDS MIT deid HMS Scrubber MeDS MIT deid
One PHI P (CI) 0.01 (0.005-0.015) 0.10 (0.085-0.115) 0 0.32 (0.31-0.33) 0.45 (0.435-0.465) 0.81 (0.795-0.825) 0.16 (0.15-0.17) 0.14 (0.125-0.155) 0.42 (0.40-0.44)
  R (CI) 0.02 (0.015-0.025) 0.21 (0.20-0.22) 0 0.65 (0.64-0.66) 0.78 (0.765-0.795) 0.64 (0.625-0.655) 0.34 (0.325-0.355) 0.32 (0.305-0.335) 0.36 (0.345-0.375)
  F2 (CI) 0.02 (0.012-0.025) 0.17 (0.16-0.18) 0 0.54 (0.53-0.55) 0.68 (0.665-0.695) 0.67 (0.655-0.685) 0.28 (0.27-0.29) 0.25 (0.24-0.26) 0.37 (0.355-0.385)
All PHI types P (CI) 0.01 (0.005-0.015) 0.05 (0.045-0.055) 0 0.23 (0.22-0.24) 0.34 (0.325-0.365) 0.76 (0.745-0.775) 0.12 (0.115-0.125) 0.10 (0.09-0.11) 0.40 (0.335-0.465)
  R (CI) 0.02 (0.0195-0.0215) 0.14 (0.13-0.15) 0 0.47 (0.455-0.485) 0.60 (0.585-0.615) 0.60 (0.585-0.615) 0.26 (0.225-0.295) 0.22 (0.205-0.235) 0.34 (0.325-0.355)
  F2 (CI) 0.02 (0.018-0.022) 0.10 (0.09-0.11) 0 0.39 (0.38-0.40) 0.52 (0.505-0.535) 0.63 (0.615-0.645) 0.21 (0.195-0.225) 0.18 (0.17-0.19) 0.35 (0.315-0.385)
MACHINE LEARNING-BASED SYSTEMS
Overall results EXACT MATCHES PARTIAL MATCHES FULLY-CONTAINED MATCHES
    MIST HIDE MIST HIDE MIST HIDE
One PHI P (CI) 0.54 0.50 0.95 0.89 0.58 0.56
    (0.52-0.56) (0.48-0.52) (0.935-0.965) (0.875-0.905) (0.56-0.60) (0.54-0.58)
  R (CI) 0.25 0.27 0.46 0.49 0.28 0.30
    (0.24-0.26) (0.26-0.28) (0.445-0.475) (0.475-0.505) (0.27-29) (0.29-31)
  F2 (CI) 0.28 0.30 0.51 0.54 0.31 0.33
    (0.265-0.295) (0.285-0.315) (0.495-0.525) (0.525-0.555) (0.295-0.325) (0.315-0.345)
All PHI types P (CI) 0.52 0.48 0.90 0.84 0.55 0.52
    (0.495-0.545) (0.46-0.50) (0.885-0.915) (0.825-0.855) (0.525-0.575) (0.50-0.54)
  R (CI) 0.24 0.25 0.44 0.46 0.27 0.28
    (0.225-255) (0.24-0.26) (0.425-0.455) (0.445-0.475) (0.255-0.285) (0.265-0.295)
  F2 (CI) 0.27 0.28 0.49 0.50 0.30 0.31
    (0.255-0.285) (0.265-0.295) (0.475-0.505) (0.485-0.515) (0.285-0.315) (0.295-0.325)
  1. CI: Confidence Interval obtained with a confidence level of 95%.
  2. One PHI = one overall PHI category considered.
  3. All PHI types = each PHI type evaluated separately.
  4. P = Precision; R = Recall; F 2  = F2-measure.