Data Security vs Cyber Security

Cyber security and data security are closely related concepts that operate at different levels and provide different safeguards. Cyber security is primarily about controlling access to systems and data through different security protection mechanisms, from the physical network layer all the way to the application layer. These security mechanisms come primarily in the form of … More Data Security vs Cyber Security

Secure and Ephemeral AI Workloads in Data Mesh Environments

A colleague and I have just released on arXiv a paper titled “Enabling Secure and Ephemeral AI Workloads in Data Mesh Environments”. The key innovation is in pushing the now well-established idea of minimal immutable data structures up and down the software infrastructure stack a bit further than what others have done, resulting in a … More Secure and Ephemeral AI Workloads in Data Mesh Environments

Winners and Losers in the AI Commercial Landscape

With NVIDIA seemingly steaming ahead in their latest quarterly result, Apple Intelligence receiving a lukewarm response from users, Wall Street increasingly worried about the return-on-investment from the hyperscalers’ massive capital investments, stories that CIOs are struggling to find ROI for AI, and news in the last two days that Intel and Samsung are both struggling … More Winners and Losers in the AI Commercial Landscape

How To Deal with Database Reconstruction Attacks

I have been thinking about data security issues, in particular database-reconstruction attacks. To quote Wikipedia, a reconstruction attack is any method for partially reconstructing a private database from public aggregate information. The question I am specifically interested in is this: Can an attacker with general interactive query access to a dataset recover a piece of … More How To Deal with Database Reconstruction Attacks

Influence Flower

Regular users of arXiv.org may have noticed that on every paper’s page, under the Related Papers tab, one can now find the paper’s Influence Flower, which is a nice way to visualise citation influences among academic entities, including papers, authors, institutions, and research topics. The following, for example, are the author-centric and venue-centric influence flowers … More Influence Flower

A Note on Large Scale Data Matching and Entity Resolution

Data matching and entity resolution is a common first step in data preparation and there is a thousand academic papers written on the subject in the literature. In practice, for large datasets – anything more than a million records will do as a definition of large here because most data-matching algorithms can’t handle that because … More A Note on Large Scale Data Matching and Entity Resolution

Large-Scale Distributed Analytics: A Research Program

Since starting my part-time appointment as an associate professor at the Australian National University, I have been thinking about spending more time on fundamental research. As Don Knuth counsels, “if you find that you’re spending almost all your time on theory, start turning some attention to practical things; it will improve your theories. If you … More Large-Scale Distributed Analytics: A Research Program

Scalable Entity Resolution Using Probabilistic Signatures on Parallel Databases

My colleagues and I have just published on arXiv a simple but highly effective Entity Resolution algorithm that can scale to billions of records and handle significant data quality issues. The paper is titled Scalable Entity Resolution Using Probabilistic Signatures on Parallel Databases and it is an extension of our previous paper on linking millions of addresses … More Scalable Entity Resolution Using Probabilistic Signatures on Parallel Databases