Data Science – Mental Models 4 Life

Martingale Tests for Model Misspecification in Bayesian Sequence Prediction

May 9, 2026

Using sequential hypothesis testing techniques to check the modelling assumptions of Bayesian mixture estimators is a promising way of getting value out of combining the Bayesian and frequentist approaches to probability. Here’s a paper to show how that can be done for Context Tree Weighting and related methods. Paper Abstract: Universal Bayesian sequence predictors like … More Martingale Tests for Model Misspecification in Bayesian Sequence Prediction

Notes on Conformal Prediction and Testing

February 8, 2026

All my life I have been searching for simple and effective methods for constructing prediction intervals for different AI/ML models. I don’t know why I never encountered Conformal Prediction until recently, but I suppose it is better late than never. Conformal prediction is (arguably) the most elegant and practical technique for improving the robustness in … More Notes on Conformal Prediction and Testing

On the Semantics of Differential Privacy and Its Responsible Use

September 15, 2025

Differential Privacy (DP) is one of the most widely adopted formal model of privacy protection but its semantics, especially in the presence of correlated data and in the adversarial interactive setting, is still not broadly understood among data science practitioners. In this paper, we first look at how DP originated from research on database-reconstruction attacks … More On the Semantics of Differential Privacy and Its Responsible Use

Secure and Ephemeral AI Workloads in Data Mesh Environments

June 3, 2025

A colleague and I have just released on arXiv a paper titled “Enabling Secure and Ephemeral AI Workloads in Data Mesh Environments”. The key innovation is in pushing the now well-established idea of minimal immutable data structures up and down the software infrastructure stack a bit further than what others have done, resulting in a … More Secure and Ephemeral AI Workloads in Data Mesh Environments

Approximating Solomonoff Induction

November 23, 2024

As is well-known by now, the universal AI agent AIXI is made up of two key components: Solomonoff Induction for universal sequential prediction, and expectimax search for planning. There are several proposed and reasonably effective approximations of the Solomonoff Induction component using the factored, binarised Context Tree Weighting algorithm [WST95, VNHUS09] and its generalisation to … More Approximating Solomonoff Induction

Natural Exponential Functions in Inequalities

October 16, 2024

Have you ever wondered why the natural exponential function shows up so frequently in mathematical inequalities? Here’s a graph of the natural exponential function. The constant e has a special place in mathematics, which is beautifully chronicled in Eli Maor’s book [M94]. The definition of e that is most useful and intuitive for our purpose … More Natural Exponential Functions in Inequalities

Dealing with Linkage Attacks using Differential Privacy

September 13, 2024

A key claim of differential privacy in [DR14] is that it provides “automatic neutralization of linkage attacks, including all those attempted with all past, present, and future datasets and other forms and sources of auxiliary information”. This is an important and often repeated claim — see e.g. [N17, Section E] and [PR23] — but the … More Dealing with Linkage Attacks using Differential Privacy

Privacy Technologies for Financial Intelligence

August 20, 2024

It took a little while to write, but hopefully the following survey paper by Yang Li, Thilina Ranbaduge and yours truly can help demystify financial intelligence and privacy technologies for practitioners and technologists alike. The focus is on anti-money laundering and counter-terrorism financing, but the opportunity set is much broader. https://arxiv.org/abs/2408.09935 Here’s the abstract of … More Privacy Technologies for Financial Intelligence

How To Deal with Database Reconstruction Attacks

March 9, 2024

I have been thinking about data security issues, in particular database-reconstruction attacks. To quote Wikipedia, a reconstruction attack is any method for partially reconstructing a private database from public aggregate information. The question I am specifically interested in is this: Can an attacker with general interactive query access to a dataset recover a piece of … More How To Deal with Database Reconstruction Attacks

Influence Flower

December 25, 2023

Regular users of arXiv.org may have noticed that on every paper’s page, under the Related Papers tab, one can now find the paper’s Influence Flower, which is a nice way to visualise citation influences among academic entities, including papers, authors, institutions, and research topics. The following, for example, are the author-centric and venue-centric influence flowers … More Influence Flower