HMS Scrubber | MeDS | MIT deid | MIST | HIDE | ||
---|---|---|---|---|---|---|
Main technique | Rule-based | X | X | X | n/a | n/a |
ML-based | n/a | n/a | n/a | X | X | |
Programming language | Java | Java | Perl | Python | Python | |
ML algorithm | n/a | n/a | n/a | CRF (Carafe) | CRF (CRFsuite) | |
Input documents | XML/txt | HL7/txt | txt | txt/XML-inline/json | XML/txt/HL7 | |
HIPAA compliant | X | X | X | 1 | 1 | |
Regular Expressions (#) | ~50 | ~40 | ~90 | 2 | 2 | |
PHI markers (e.g., Mr.) | X | X | X | 3 | -- | |
Part-of-speech information | -- | X | -- | -- | -- | |
String similarity techniques (e.g. edit distance, fuzzy matching) | -- | X | -- | -- | -- | |
Dictionaries* (size) | Person names | ~101K | ~280K | ~96K4 | -- | -- |
Geographic places | ~167K | ~4K | -- | -- | ||
US area code | -- | -- | ~380 | -- | -- | |
Medical phrases | -- | ~50 | ~28 | -- | -- | |
Medical terms | -- | ~80K | ~175K | -- | -- | |
Companies | -- | ~200 | ~500 | -- | -- | |
Ethnicities | -- | ~120 | ~195 | -- | -- | |
Common words | -- | ~220K | ~50K | -- | -- | |
Machine Learning features | Contextual window | n/a | n/a | n/a | 3-words | 4-words |
Morphological (#) | n/a | n/a | n/a | 22 | 34 | |
Syntactic | n/a | n/a | n/a | -- | -- | |
Semantic | n/a | n/a | n/a | -- | -- | |
From dictionaries | n/a | n/a | n/a | 5 | 5 |