Skip to main content

Table 3 Resources used by systems mostly based on pattern matching and/or rule-based methods.

From: Automatic de-identification of textual documents in the electronic health record: a review of recent research

De-identification system Knowledge resources Principal methods
Beckwith Lists of proper names, locations Regular expressions and dictionaries.
Berman UMLS Metathesaurus, stop words Dictionaries
Fielstein Lists of cities and VA PHI (patient names, SSNs, MRNs...) Regular expressions and dictionaries.
Friedlin Lists of names (including Regenstrief patients), locations. Regular expressions and dictionaries; identifiers in HL7 messages.
Gupta (De-ID system) UMLS Metathesaurus, institution-specific identifiers Regular expressions and dictionaries; identifiers in report headers.
Morrison (MedLEE) MedLEE lexicon and UMLS Metathesaurus. Rules/grammar-based, with dictionaries.
Neamatullah Lists of common English words (non-PHI), names, locations, UMLS Metathesaurus and other medical terms, known patients and healthcare providers in the institution. Regular expressions and dictionaries.
Ruch MEDTAG lexicon (enriched with healthcare institution names, drug names, procedures, and devices) Rule-based, with dictionaries.
Sweeney Lists of names, U.S. states, countries, medical terms. Rule-based, with dictionaries.
Thomas List of names, UMLS Metathesaurus, Ispell terms. Regular expressions and dictionaries.