Current approaches to identify sections within clinical narratives from electronic health records: a systematic review

Table 3 Machine learning studies

Reference	ML method	Training and test data set source	Training data set size	Test data set size	Method
Bramsen et al. [3]	AdaBoost	M	60	CV	ML
Haug et al. [17]	Bayesian Network	M	3483	CV	ML
Chen et al. [5], Dai et al. [7]	Conditional Random Fields	M, RB, CO	790	514	H
Deléger and Névéol [8]	Conditional Random Fields	M	100	600	ML
Ni et al. [40]	Conditional Random Fields and Maximum Entropy Classifier	M, AL	NS	NS	H
Jancsary et al. [20]	Conditional Random Fields and Viterbi	M, RB	2340	1003	H
Cho et al. [6]	Expectation Maximization Classifier	M, RB	NS	NS	H
Li et al. [29]	Hidden Markov Model and Viterbi	M, RB	7549	2130	ML
Lohr et al. [31]	Logistic Regression	M	1106	CV	ML
Ganesan and Subotin [16]	Logistic Regression and Viterbi	M, RB	1800	12502	H
Tepper et al. [57]	Maximum Entropy Classifier	M, CO	1365	374	ML
Sadoughi et al. [46]	Neural Network	M, RB	25842	2000	H
Apostolova et al. [1]	Support Vector Machine	M, RB	3000	200	H
Mowery et al. [39]	Support Vector Machine	M	50	CV	ML
Waranusast et al. [62]	Support Vector Machine and KNN	M	10694	CV	ML

NS=Not Specified; Training and Test Data Set Source: M=Manually created, RB=Using a rule-based approach, CO= Using a data set provided by competition organizers, AC= Using an active learning strategy; Test Data Set Size: CV=Cross Validation; Method: ML=Machine Learning, H=Hybrid

ISSN: 1471-2288