Skip to main content

Table 3 Resources used by systems mostly based on pattern matching and/or rule-based methods.

From: Automatic de-identification of textual documents in the electronic health record: a review of recent research

De-identification system

Knowledge resources

Principal methods

Beckwith

Lists of proper names, locations

Regular expressions and dictionaries.

Berman

UMLS Metathesaurus, stop words

Dictionaries

Fielstein

Lists of cities and VA PHI (patient names, SSNs, MRNs...)

Regular expressions and dictionaries.

Friedlin

Lists of names (including Regenstrief patients), locations.

Regular expressions and dictionaries; identifiers in HL7 messages.

Gupta (De-ID system)

UMLS Metathesaurus, institution-specific identifiers

Regular expressions and dictionaries; identifiers in report headers.

Morrison (MedLEE)

MedLEE lexicon and UMLS Metathesaurus.

Rules/grammar-based, with dictionaries.

Neamatullah

Lists of common English words (non-PHI), names, locations, UMLS Metathesaurus and other medical terms, known patients and healthcare providers in the institution.

Regular expressions and dictionaries.

Ruch

MEDTAG lexicon (enriched with healthcare institution names, drug names, procedures, and devices)

Rule-based, with dictionaries.

Sweeney

Lists of names, U.S. states, countries, medical terms.

Rule-based, with dictionaries.

Thomas

List of names, UMLS Metathesaurus, Ispell terms.

Regular expressions and dictionaries.