Skip to main content

Table 1 Automatic de-identification systems and their principal characteristics

From: Automatic de-identification of textual documents in the electronic health record: a review of recent research

1st author System Name Availability/License Programming language/Resources (when known) Knowledge resources Document Types
Aramaki [23] System for the i2b2 de-identification challenge Not publicly available CRF++1 Lists of names, locations, dates Discharge summaries
Beckwith [14] HMS Scrubber Open source (GNU LGPL v2) Java, JDOM, MySQL Lists of names, locations Surgical pathology reports
Berman [5] Concept-Match System freely available Perl UMLS Metathesaurus Surgical pathology reports
Fielstein [7] (VA system) Not publicly available Perl Lists of names, locations, email addresses VA compensation and pension examinations
Friedlin [8] MeDS Not publicly available Java Lists of names, locations, medical terms HL7 messages
Gardner [24] HIDE Open source (Common Public License v1) Perl, Java, Mallet 2 None Surgical pathology reports
Guo [25] System for the i2b2 de-identification challenge Not publicly available GATE 3
(ANNIE, JAPE), Java, SVMlight 4
Lists of locations, hospitals. Discharge summaries
Gupta [15] DE-ID (DE-ID Data Corp., Richboro, PA) Commercial system, not freely available. Unknown List of U.S. census names, user defined dictionaries Surgical pathology reports
Hara [27] System for the i2b2 de-identification challenge Not publicly available C++, BACT and YamCha 5 None Discharge summaries
Morrison [18] MedLEE Not publicly available Prolog MedLEE lexicon, UMLS Metathesaurus Outpatient follow-up notes
Neamatullah [9] (MIT system) Open source (GNU GPL v2) Perl Lists of common English words (non-PHI), terms indicating PHI, names and locations, known PHI (patients and staff list!) Nursing progress notes, discharge summaries
Ruch [19] MEDTAG framework-based Not publicly available Unknown MEDTAG lexicon (based on UMLS Metathesaurus; only in French) Various clinical documents (multilingual)
Sweeney [20] Scrub Not publicly available Unknown Lists of area codes, names Various clinical documents
Szarvas [28] System for the i2b2 de-identification challenge Not publicly available Weka 6 Lists of first names, locations, diseases, non-PHI (general English) Discharge summaries
Taira [30] (UCLA system) Not publicly available Unknown List of names, and drugs Various clinical documents
Thomas [33] (Regenstrief Institute system) Not publicly available Java, XSL List of names, UMLS Metathesaurus terms. Surgical pathology reports
Uzuner [31] Stat De-id Not publicly available (open source release planned). LIBSVM 7 MeSH terms, lists of names, locations, and hospitals. Discharge summaries
Wellner [32] System for the i2b2 de-identification challenge Open source (BSD) Ocaml 8,
Carafe 9
Lists of US states, months, common English words. Discharge summaries
  1. 1 http://crfpp.sourceforge.net/
  2. 2 http://mallet.cs.umass.edu/
  3. 3 http://gate.ac.uk/
  4. 4 http://svmlight.joachims.org/
  5. 5 http://www.chasen.org/~taku/software/
  6. 6 http://www.cs.waikato.ac.nz/ml/weka/
  7. 7 http://www.csie.ntu.edu.tw/~cjlin/libsvm
  8. 8 http://caml.inria.fr/ocaml/index.en.html
  9. 9 http://sourceforge.net/projects/carafe/