Skip to main content

Table 1 Automatic de-identification systems and their principal characteristics

From: Automatic de-identification of textual documents in the electronic health record: a review of recent research

1st author

System Name

Availability/License

Programming language/Resources (when known)

Knowledge resources

Document Types

Aramaki [23]

System for the i2b2 de-identification challenge

Not publicly available

CRF++1

Lists of names, locations, dates

Discharge summaries

Beckwith [14]

HMS Scrubber

Open source (GNU LGPL v2)

Java, JDOM, MySQL

Lists of names, locations

Surgical pathology reports

Berman [5]

Concept-Match

System freely available

Perl

UMLS Metathesaurus

Surgical pathology reports

Fielstein [7]

(VA system)

Not publicly available

Perl

Lists of names, locations, email addresses

VA compensation and pension examinations

Friedlin [8]

MeDS

Not publicly available

Java

Lists of names, locations, medical terms

HL7 messages

Gardner [24]

HIDE

Open source (Common Public License v1)

Perl, Java, Mallet 2

None

Surgical pathology reports

Guo [25]

System for the i2b2 de-identification challenge

Not publicly available

GATE 3

(ANNIE, JAPE), Java, SVMlight 4

Lists of locations, hospitals.

Discharge summaries

Gupta [15]

DE-ID (DE-ID Data Corp., Richboro, PA)

Commercial system, not freely available.

Unknown

List of U.S. census names, user defined dictionaries

Surgical pathology reports

Hara [27]

System for the i2b2 de-identification challenge

Not publicly available

C++, BACT and YamCha 5

None

Discharge summaries

Morrison [18]

MedLEE

Not publicly available

Prolog

MedLEE lexicon, UMLS Metathesaurus

Outpatient follow-up notes

Neamatullah [9]

(MIT system)

Open source (GNU GPL v2)

Perl

Lists of common English words (non-PHI), terms indicating PHI, names and locations, known PHI (patients and staff list!)

Nursing progress notes, discharge summaries

Ruch [19]

MEDTAG framework-based

Not publicly available

Unknown

MEDTAG lexicon (based on UMLS Metathesaurus; only in French)

Various clinical documents (multilingual)

Sweeney [20]

Scrub

Not publicly available

Unknown

Lists of area codes, names

Various clinical documents

Szarvas [28]

System for the i2b2 de-identification challenge

Not publicly available

Weka 6

Lists of first names, locations, diseases, non-PHI (general English)

Discharge summaries

Taira [30]

(UCLA system)

Not publicly available

Unknown

List of names, and drugs

Various clinical documents

Thomas [33]

(Regenstrief Institute system)

Not publicly available

Java, XSL

List of names, UMLS Metathesaurus terms.

Surgical pathology reports

Uzuner [31]

Stat De-id

Not publicly available (open source release planned).

LIBSVM 7

MeSH terms, lists of names, locations, and hospitals.

Discharge summaries

Wellner [32]

System for the i2b2 de-identification challenge

Open source (BSD)

Ocaml 8,

Carafe 9

Lists of US states, months, common English words.

Discharge summaries

  1. 1 http://crfpp.sourceforge.net/
  2. 2 http://mallet.cs.umass.edu/
  3. 3 http://gate.ac.uk/
  4. 4 http://svmlight.joachims.org/
  5. 5 http://www.chasen.org/~taku/software/
  6. 6 http://www.cs.waikato.ac.nz/ml/weka/
  7. 7 http://www.csie.ntu.edu.tw/~cjlin/libsvm
  8. 8 http://caml.inria.fr/ocaml/index.en.html
  9. 9 http://sourceforge.net/projects/carafe/