Bloom Filters is a popular technique for privacy-preserving record linkage. However, recent work by Christen et al [1] and others have shown that Bloom Filters (BF) are susceptible to different forms of frequency attack. There are many ideas on hardening BF to protect against frequency attacks, and one idea we will explore in this blog article is the use of Paillier encryption, which is a partial homomorphic encryption scheme, to encode Bloom Filters.

Here’s a basic set up in the two-party case.

The party X that wants to publish a dataset for matching by anyone first Paillier-encrypt the database using its own public key. It’s important to note that Paillier is a probabilistic encryption algorithm so the same plaintext will be turned into different ciphertexts each time we run the encrypt() function. This is also the reason why a Paillier-encrypted dataset is secure from frequency attack.

Any party Y that wants to do data matching on X’s dataset

- first downloads the encrypted database,
- compute a cross-product of the encrypted database with its own database,
- compute pairwise Dice coefficient, which involves only dot products so supported by Paillier (scalar multiplication with an encrypted number and addition of encrypted numbers)
- send the matched dataset (id1, id2, encrypted_dice_coefficient) back to party X, who then decrypts the encrypted dice coefficient using X’s private key, and then only send back results where the dice coefficient is more than a certain threshold

That’s one scheme. If there’s a way to do blocking, then we won’t need to compute the expensive cross-product of the two datasets. We may use blocking keys on the original Bloom Filters if the blocking key don’t reveal too much information.

A second scheme one can use involves party X encrypting half the variables V in its Bloom Filters using X’s public key, and party Y encrypting half the variables (the complement of V) in its Bloom Filters using Y’s public key. Each party will send its encrypted half to the other party to compute the encrypted partial dot products. They then exchange the encrypted values, decrypts them using their own private keys, and then sends the decrypted partial values to each other to sum up to obtain the final Dice coefficients. In this scheme, each party only ever receives Paillier-encrypted partial Bloom Filters from each other and this offers protection against frequency attacks by each other.

[1] Christen et al, Pattern-Mining based Cryptanalysis of Bloom Filters for Privacy-Preserving Record Linkage, PAKDD, 2018.