Skip to main content

Table 2 Mean performance for the ML algorithms trained on the Atrain dataset, evaluated on Atest

From: A new hybrid record linkage process to make epidemiological databases interoperable: application to the GEMO and GENEPSO studies involving BRCA1 and BRCA2 mutation carriers

Models

Atest dataset

Recall

Precision

M

SD

M

SD

Bernoulli

0.01172

0.00079

0.01139

0.00096

CT

0.9841

0.016

0.9779

0.0059

Bagged trees

0.9809

0.012

0.9826

0.0080

AdaBoost

0.9839

0.011

0.9828

0.0075

RF

0.9853

0.011

0.9824

0.010

SVM

0.9821

0.017

0.9789

0.0068

NNET

0.9823

0.012

0.9843

0.0078

  1. Six machine learning algorithms were tested: Classification Tree (CT), Bagged trees, AdaBoost, Random Forest (RF), Support Vector Machine (SVM) and Neural Network (NNET). M mean, SD standard deviation. The highest mean values among the different algorithms are highlighted in bold