FinTracer and Friends

About 5 years ago, Tania Churchill and I assembled a team of researchers and engineers across AUSTRAC and ANU to work on privacy technologies for detecting criminal activities across the financial system, funded by the Fintel Alliance Expansion budget measure, the Investigative Analytics NPP (led by CSIRO’s Data61), and an ANU Translational Fellowship. The overall R&D program was summarised in these two articles, the latter by Nathan Lynch from Thomson Reuters, author of The Lucky Laundry:

There are basically three fundamental computational problems to solve, in increasing order of difficulty:

  • How can two financial intelligence units, each of which has a list of suspicious entities, share only the common suspicious entities with each other, without sharing their entire list with each other? More generally, how can such sharing be done between multiple financial intelligence units?
  • Given a criminal typology that can be described as a graph pattern, where nodes denote entities and edges denote relationships, how can we detect all entities that satisfy the criminal typology when the underlying graph span multiple financial institutions?
  • How can we perform model learning, either supervised or unsupervised, over a set of entities whose financial data and behaviour are spread across multiple financial institutions?

The first problem can be solved using private set intersection algorithms, either exact match or approximate match, and there are a number of commercial-grade solutions that do exactly that, including the Ma3tch algorithm used in the FIU.net system in Europe.

The second problem is challenging to solve at scale and I am really pleased to see the FinTracer algorithm, the core technology developed by my former AUSTRAC colleague Michael Brand, finally released for public use here:

The algorithm is basically a way to perform efficient and privacy-preserving breadth-first search across a large distributed graph in homomorphically-encrypted space (using El-Gamal), allowing Financial Intelligence Units like AUSTRAC to work with law-enforcement agencies and financial institutions to start with some known suspicious entities and find their footprint across the financial system without

  • having to disclose the suspicious entities to financial institutions, thus avoiding the risk of accidentally tipping off the entities;
  • having to violate the privacy of unrelated entities even though their data need to be processed.

The key observation behind this algorithm is that graph traversal can be expressed as simple matrix multiplication operations, and that large distributed graphs can be represented using sparse matrix representations. This basic graph search algorithm can serve as a building block to detect a large range of financial crime typologies (i.e. graph patterns), thus significantly advancing the state-of-the-art in intelligence agency’s ability to detect complex financial crimes in a responsible, privacy-preserving manner at a large scale.

The third problem of performing federated model learning on data that are spread across multiple financial institutions needed a scalable privacy-preserving probabilistic data-matching algorithm. Hamish Ivey-Law and I have a proposed solution that performs the P-Sig algorithm (by Yuhang Zhang) in homomorphically-encrypted space, but the resultant algorithm is arguably too complex and computationally intensive to work at the scale of Australia’s financial system — we need to be able to match two datasets that are at least 20 million in size each — so we are still thinking about it, but a draft paper is available to anyone who is interested.

In the mean time, we have a pretty good proposal for the problem of solving the fairness problem in federated learning, where the quality of the model obtained by a party involved in federated model learning is commensurate with the actual contributions made by the party. The algorithm is described here:

We also have a general Bayesian filtering algorithm on graphs that can continually collect observations from the financial system, possibly using some of the above privacy technologies, and use the observations to update and track the general “health” of the financial system:

This filtering algorithm has subsequently been generalised into a Differentially Private Reinforcement Learning algorithm over Population Processes by my phd student, which allows accurate tracking and decision-making even when observations are collected interactively from an underlying population with differential privacy noise, for sufficiently large population sizes.

In several of the algorithms above, variations of Private Set Intersection (PSI) is sometimes required as subroutines. Given PSI can used in an adversarial way to reconstruct a private database through interactive queries, a final original research worth mentioning is the Differentially Private Set Intersection Cardinality algorithm designed by Michael Purcell that can be used during exploratory private data analysis:

In addition to the above work, the research team over at ANU and Data61 also produced these two comprehensive and pedagogically useful papers on Homomorphic Encryption and Differential Privacy on Graph Data, both of which are foundational privacy technologies in our general application domain:

We still have a couple of papers up our sleeves that are going through peer review, but I think it’s worth sharing this body of work as a somewhat coherent whole at this point. Most of the techniques described above have applications beyond financial intelligence, and this body of work is also a good demonstration that the Australian public service can do world-class innovation work, especially in collaboration with academic and industry partners. The collaboration model is not via the usual R&D grant approach — I have always struggled with that model — but via a combination of co-employment of a few key staff, and loose collaboration on topics of common interest over an extended period of time on an enduring business problem. This model is not perfect by any means — there were long stretches where I had deep doubts, shall we say — but it’s an alternative to the normal 1-2 years R&D grant approach worth considering.


Leave a comment