Skip to main content

Table 4 Machine learning features

From: Current approaches to identify sections within clinical narratives from electronic health records: a systematic review

Reference Lexical Syntactical Semantic Contextual Method
Bramsen et al. [3] U,N POS RT,AT,T LP ML
Haug et al. [17] N NS NS NS ML
Chen et al. [5], Dai et al. [7] C,A Pun ST WL H
Deléger and Névéol [8] NS NS NS NS ML
Ni et al. [40] NS NS NS NS H
Jancsary et al. [20] N POS,Pun ST LP H
Cho et al. [6] NS NS ST SS,OS H
Li et al. [29] N NS NS SB ML
Lohr et al. [31] U NS NS NS ML
Ganesan and Subotin [16] U,N,C NS ST LP,LL,LC,CC H
Tepper et al. [57] U,C Nu NS LP,WL,SS,SB ML
Sadoughi et al. [46] NS NS NS SB H
Apostolova et al. [1] N,C Pun ST LP,WL,SB H
Mowery et al. [39] U,N POS,VT ST,DI,MN LP,LL,SB ML
Waranusast et al. [62] NS NS NS SB ML
  1. NS=Not Specified; Lexical: U=Unigram, N=N-gram, C= Capitalized, A=Affixes ; Syntactical: POS=Word Part of Speech, VT=Verb Tense, Pun=Punctuation, Nu=contains of begins with a number; Semantic: ST=Semantic Type (e.g. UMLS, LOINC), DI=De-identification tag, MN=Meaning of the number(e.g. phone, dosis), RT=is it a relative temporal word (e.g. later, next, until), AT=is it an absolute temporal word (e.g. am, pm), T=Topic of the section; Contextual: LP=Line position in the document, LL=Length of a line, WL=White lines before and after a line, LC=Length change from one line to another, SS=Section size, SB=Previous and following section boundaries, OS=Order of sections, CC= Capital and colon use; Method: ML=Machine Learning, H=Hybrid