Skip to main content

Advertisement

Table 2 PHI category distribution and mapping for the VHA, i2b2 and Swedish Stockholm EPR corpora

From: Evaluating current automatic de-identification methods with Veteran’s health administration clinical documents

VHA corpus Instances i2b2 corpus Instances Stockholm EPR De-identified Corpus Instances
Patient Name 206 (3.88%) Patients 929 (4.76%) Person Name First Name 923 (20.87%)
Relative Name 30 (0.55%)      
Other Person Name 20 (0.37%)     Last Name 929 (21%)
Healthcare Provider Name 492 (9.08%) Doctors 3751 (19.24%)    
Street City 137 (2.53%) Locations 263 (1.35%) Location 148 (3.35%)
State Country 161 (2.97%)      
Zip code 4 (0.07%)      
Deployment 43 (0.79%) - - - -
Healthcare Unit Name 1453 (26.83%) Hospitals 2400 (12.31%) Health_Care_Unit 1021 (23.08%)
Other Organization 86 (1.59%) - - - -
Date 2547 (47.03%) Dates 7098 (36.40%) Date_Part 710 (16.05%)
     Full_Date 500 (11.30%)
Age > 89 4 (0.07%) Ages 16 (0.08%) Age 56 (1.27%)
Phone Number 90 (1.66%) Phone Numbers 232 (1.19%) Phone Number 136 (3.07%)
Electronic Address 4 (0.07%) - - - -
SSN 16 (0.30%) IDs 4809 (24.66%) - -
Other ID Number 123 (2.27%)    - -