Linking Integer Records: The Simplest Case of PPRL

Privacy-Preserving Record Linkage (PPRL) is one of those problems that still doesn’t have a solid and widely accepted mathematical definition, perhaps because the problem of Record Linkage itself, especially the kind that doesn’t reduce to supervised learning through an abundance of labelled matches, still doesn’t have a solid mathematical definition despite thousands of papers published … More Linking Integer Records: The Simplest Case of PPRL

Hardening Bloom Filters using Paillier Encryption

Bloom Filters is a popular technique for privacy-preserving record linkage. However, recent work by Christen et al [1] and others have shown that Bloom Filters (BF) are susceptible to different forms of frequency attack. There are many ideas on hardening BF to protect against frequency attacks, and one idea we will explore in this blog article … More Hardening Bloom Filters using Paillier Encryption

Scalable Entity Resolution Using Probabilistic Signatures on Parallel Databases

My colleagues and I have just published on arXiv a simple but highly effective Entity Resolution algorithm that can scale to billions of records and handle significant data quality issues. The paper is titled Scalable Entity Resolution Using Probabilistic Signatures on Parallel Databases and it is an extension of our previous paper on linking millions of addresses … More Scalable Entity Resolution Using Probabilistic Signatures on Parallel Databases

Practical Algorithms for Distributed Privacy-Preserving Risk Modelling

In a previous post on the problem of detecting complex financial crimes, I described the following basic technology framework for financial intelligence units (FIUs) and their partner agencies and reporting entities (REs) to engage in collaborative but privacy-preserving and distributed risk modelling using confidential computing technologies. In this post, I describe a few concrete algorithms that … More Practical Algorithms for Distributed Privacy-Preserving Risk Modelling

How to Quickly and Meaningfully Improve the Financial System’s Collective Ability to Detect Crimes

Complex financial crimes are hard to detect primarily because data related to different pieces of the overall puzzle are usually distributed across a network of financial institutions, regulators, and law-enforcement agencies. The problem is also rapidly increasing in complexity because new platforms are emerging all the time that facilitate the transfer of value across a … More How to Quickly and Meaningfully Improve the Financial System’s Collective Ability to Detect Crimes

How to Link Millions of Addresses with Ten Lines of Code in Ten Minutes

Solving big hairy problems like detecting complex financial crimes requires solving a series of smaller, mundane but technically non-trivial problems. Performing efficient record linkage on large databases with tens to hundreds of millions of rows of data is one such pesky problem. A few of my colleagues have just made a small dent on the overall … More How to Link Millions of Addresses with Ten Lines of Code in Ten Minutes