Current approaches to identify sections within clinical narratives from electronic health records: a systematic review

Table 4 Machine learning features

Reference	Lexical	Syntactical	Semantic	Contextual	Method
Bramsen et al. [3]	U,N	POS	RT,AT,T	LP	ML
Haug et al. [17]	N	NS	NS	NS	ML
Chen et al. [5], Dai et al. [7]	C,A	Pun	ST	WL	H
Deléger and Névéol [8]	NS	NS	NS	NS	ML
Ni et al. [40]	NS	NS	NS	NS	H
Jancsary et al. [20]	N	POS,Pun	ST	LP	H
Cho et al. [6]	NS	NS	ST	SS,OS	H
Li et al. [29]	N	NS	NS	SB	ML
Lohr et al. [31]	U	NS	NS	NS	ML
Ganesan and Subotin [16]	U,N,C	NS	ST	LP,LL,LC,CC	H
Tepper et al. [57]	U,C	Nu	NS	LP,WL,SS,SB	ML
Sadoughi et al. [46]	NS	NS	NS	SB	H
Apostolova et al. [1]	N,C	Pun	ST	LP,WL,SB	H
Mowery et al. [39]	U,N	POS,VT	ST,DI,MN	LP,LL,SB	ML
Waranusast et al. [62]	NS	NS	NS	SB	ML

NS=Not Specified; Lexical: U=Unigram, N=N-gram, C= Capitalized, A=Affixes ; Syntactical: POS=Word Part of Speech, VT=Verb Tense, Pun=Punctuation, Nu=contains of begins with a number; Semantic: ST=Semantic Type (e.g. UMLS, LOINC), DI=De-identification tag, MN=Meaning of the number(e.g. phone, dosis), RT=is it a relative temporal word (e.g. later, next, until), AT=is it an absolute temporal word (e.g. am, pm), T=Topic of the section; Contextual: LP=Line position in the document, LL=Length of a line, WL=White lines before and after a line, LC=Length change from one line to another, SS=Section size, SB=Previous and following section boundaries, OS=Order of sections, CC= Capital and colon use; Method: ML=Machine Learning, H=Hybrid

ISSN: 1471-2288