Data Security vs Cyber Security

Cyber security and data security are closely related concepts that operate at different levels and provide different safeguards. Cyber security is primarily about controlling access to systems and data through different security protection mechanisms, from the physical network layer all the way to the application layer. These security mechanisms come primarily in the form of … More Data Security vs Cyber Security

What Can Differential Privacy Actually Protect?

Differential Privacy (DP) is, by now, the most widely adopted formal model of privacy protection used in industry [L23] and government [ABS22] but my sense is that its “semantics”, especially in the presence of correlated data and in the adversarial interactive setting, is still not broadly understood in the community, especially among practitioners. In the … More What Can Differential Privacy Actually Protect?

How To Deal with Database Reconstruction Attacks

I have been thinking about data security issues, in particular database-reconstruction attacks. To quote Wikipedia, a reconstruction attack is any method for partially reconstructing a private database from public aggregate information. The question I am specifically interested in is this: Can an attacker with general interactive query access to a dataset recover a piece of … More How To Deal with Database Reconstruction Attacks

Bayesian Filtering on Structured Environments

A few colleagues and I have just completed a new research paper titled Factored Conditional Filtering: Tracking States and Estimating Parameters in High-Dimensional Spaces. The research took over 3 years and I am really excited about the underlying theory and its possible applications. In particular, the paper shows how we can lift’ Bayesian filtering to … More Bayesian Filtering on Structured Environments

A Note on Large Scale Data Matching and Entity Resolution

Data matching and entity resolution is a common first step in data preparation and there is a thousand academic papers written on the subject in the literature. In practice, for large datasets – anything more than a million records will do as a definition of large here because most data-matching algorithms can’t handle that because … More A Note on Large Scale Data Matching and Entity Resolution

Towards Fair and Privacy-Preserving Federated Deep Learning Models

My former postdoc Lingjuan Lyu has been working with a few research collaborators on a fair and privacy-preserving federated deep-learning framework and a paper describing the framework has just been published at the IEEE Transactions on Parallel and Distributed Systems. Here’s the paper details: Title: Towards Fair and Privacy-Preserving Federated Deep Models Abstract: The current … More Towards Fair and Privacy-Preserving Federated Deep Learning Models

Distributed Privacy-Preserving Prediction

Another day, another paper, this time by my postdoc Lingjuan Lyu and a few collaborators. Here’s the abstract: In privacy-preserving machine learning, individual parties are reluctant to share their sensitive training data due to privacy concerns. Even the trained model parameters or prediction can pose serious privacy leakage. To address these problems, we demonstrate a … More Distributed Privacy-Preserving Prediction

Accurate and Efficient Privacy-Preserving String Matching

A few ANU colleagues and I have just completed a paper on a suffix-tree-based algorithm for computing the longest common substring of two strings in a privacy-preserving manner. Here’s the abstract: The task of calculating similarities between strings held by different organizations without revealing these strings is an increasingly important problem in areas such as … More Accurate and Efficient Privacy-Preserving String Matching

Linking Integer Records: The Simplest Case of PPRL

Privacy-Preserving Record Linkage (PPRL) is one of those problems that still doesn’t have a solid and widely accepted mathematical definition, perhaps because the problem of Record Linkage itself, especially the kind that doesn’t reduce to supervised learning through an abundance of labelled matches, still doesn’t have a solid mathematical definition despite thousands of papers published … More Linking Integer Records: The Simplest Case of PPRL

Hardening Bloom Filters using Paillier Encryption

Bloom Filters is a popular technique for privacy-preserving record linkage. However, recent work by Christen et al [1] and others have shown that Bloom Filters (BF) are susceptible to different forms of frequency attack. There are many ideas on hardening BF to protect against frequency attacks, and one idea we will explore in this blog article … More Hardening Bloom Filters using Paillier Encryption