Skip to main content

Table 1 An example of a record pair comparison and its PRL likelihood score calculation

From: A new hybrid record linkage process to make epidemiological databases interoperable: application to the GEMO and GENEPSO studies involving BRCA1 and BRCA2 mutation carriers

 

CTR

NUMFAM

SUJID

GENDER

Yob

Mob

Dob

BRCA1

BRCA2

MUT_HGVS

PRL Score

Individual GEMO_5789

1

17455

0001

2

1959

08

05

1

0

c.3403C > T

Individual GENEPSO_01082300001

1

08230

0001

2

1958

08

05

1

0

c. 3481_3491del

Similarity s

1

0

1

1

0

1

1

1

1

0.7825

f

0.02272

0.00025

0.0018

0.5000

0.01098

0.07692

0.03125

0.3333

0.3333

0.0006

w

5.45

11.95

9.1

0.99

6.49

3.68

4.99

1.57

1.57

10.69

sum(w) = 56.48

w*s

5.45

0

9.1

0.99

0

3.68

4.99

1.57

1.57

8.36

sum(w * s) = 35.71

score S

          

0.6322

  1. Ten matching variables were used to identify record pairs: BRCA1 mutational status (BRCA1), BRCA2 mutational status (BRCA2), mutation description using the HGVS nomenclature (MUT_HGVS), gender (GENDER), recruiting center number (CTR), family number (NUMFAM), individual number in the family (SUJID), year of birth (Yob), month of birth (Mob) and day of birth (Dob). BRCA1 and BRCA2 matching variable: 1: “carrier of a BRCA1/2 mutation”, 0: “non-carrier of a BRCA1/2 mutation”. GENDER matching variable: 1: male, 2: female. The similarity vector s in the third row is used as input in the machine learning approaches. The PRL score S is calculated from the weight w and the similarity s