Skip to main content

Table 2 PHI category distribution and mapping for the VHA, i2b2 and Swedish Stockholm EPR corpora

From: Evaluating current automatic de-identification methods with Veteran’s health administration clinical documents

VHA corpus

Instances

i2b2 corpus

Instances

Stockholm EPR De-identified Corpus

Instances

Patient Name

206 (3.88%)

Patients

929 (4.76%)

Person Name

First Name

923 (20.87%)

Relative Name

30 (0.55%)

     

Other Person Name

20 (0.37%)

   

Last Name

929 (21%)

Healthcare Provider Name

492 (9.08%)

Doctors

3751 (19.24%)

   

Street City

137 (2.53%)

Locations

263 (1.35%)

Location

148 (3.35%)

State Country

161 (2.97%)

     

Zip code

4 (0.07%)

     

Deployment

43 (0.79%)

-

-

-

-

Healthcare Unit Name

1453 (26.83%)

Hospitals

2400 (12.31%)

Health_Care_Unit

1021 (23.08%)

Other Organization

86 (1.59%)

-

-

-

-

Date

2547 (47.03%)

Dates

7098 (36.40%)

Date_Part

710 (16.05%)

    

Full_Date

500 (11.30%)

Age > 89

4 (0.07%)

Ages

16 (0.08%)

Age

56 (1.27%)

Phone Number

90 (1.66%)

Phone Numbers

232 (1.19%)

Phone Number

136 (3.07%)

Electronic Address

4 (0.07%)

-

-

-

-

SSN

16 (0.30%)

IDs

4809 (24.66%)

-

-

Other ID Number

123 (2.27%)

  

-

-