Skip to main content

Table 1 P3RL - Computational requirements of Masking and Shuffling, Pre-processing (100,000 records) and Linkage (100,000 records table A and 50,000 records table B)

From: Privacy Preserving Probabilistic Record Linkage (P3RL): a novel method for linking existing health-related data and maintaining participant confidentiality

Step

Linkage type

 

Plain

P3RL - Encrypted names

P3RL - Encrypted dates

Mask and shuffle

-

11 variables to mask and shuffle

5 sec

11 variables to mask and shuffle

5 sec

Pre-processing

(of variables to be encrypted, source data totally consists of 13 variables)

-

3 name variables to pre-process

- > 6 pre-processed name variables built

7 min 54 sec

2 date variables to pre-process

- > 2 pre-processed date variables built

35 sec

Encryption

(source data totally consists of 13 variables)

-

4 name variables to encrypt (trigrams, 10 hash functions, bit array size 800)

1 min 08 sec

2 date variables to encrypt

- > 4 date variables built (year plain)

15 sec

Linkage part 1

-create pairs

(filter to reduce potential pairs to 40 Mio)

13 plain variables,

10 rules

1 min 20 sec

9 plain variables,

4 encrypted name variables

1 min 33 sec

13 plain variables

2 encrypted date variables

1 min 23 sec

Linkage part 2

– apply rules

(40 Mio pairs, 10 rules)

63 min

8 Bloom filter array comparisons

(two 2x2 matrix rules)

110 min

72 min

  1. Tests were performed on Desktop Computer with Intel® Xeon® CPU, 4 cores, 64-bit, 3 GHz, 12 GB RAM, Windows 7 Professional 64 bit operating system. These estimates were derived using in-house software for masking and encryption, KNIME for pre-processing and G-LINK for linkage. G-LINK is the latest linkage software in desktop version (former GRLS), developed by Statistics Canada
  2. Estimates may vary widely using other programs and/or hardware