Data Science – Page 2 – Mental Models 4 Life

FinTracer and Friends

June 25, 2023

About 5 years ago, Tania Churchill and I assembled a team of researchers and engineers across AUSTRAC and ANU to work on privacy technologies for detecting criminal activities across the financial system, funded by the Fintel Alliance Expansion budget measure, the Investigative Analytics NPP (led by CSIRO’s Data61), and an ANU Translational Fellowship. The overall … More FinTracer and Friends

Split Count and Share: A Differentially Private Set Intersection Cardinality Algorithm

June 11, 2023

My colleagues Mike Purcell, Kelvin Yang Li and I have a new paper on differentially private set intersection cardinality algorithm accepted at this year’s Uncertainty in Artificial Intelligence conference. Here is the abstract:We describe a simple two-party protocol in which each party contributes a set as input. The output of the protocol is an estimate … More Split Count and Share: A Differentially Private Set Intersection Cardinality Algorithm

A Map of Machine Learning Principles and Algorithms

April 25, 2023

Here is my attempt to map out the major classes of algorithms in Machine Learning, organised around the associated induction principles and learning theory. The usual caveats apply around this being biased towards my own experience. At the highest level, we can distinguish between the Passive and Active learning settings. In the passive case, the … More A Map of Machine Learning Principles and Algorithms

What Are Data Products?

April 9, 2023

I realised recently that I don’t have a good working definition of what Data Products are or should be. Sure, a quick googling will surface many generic definitions of Data Products from respected sources like Forbes and McKinsey. They are all usually variations of DJ Patil’s definition: “A data product is a product that facilitates … More What Are Data Products?

A Direct Approximation of AIXI using Logical State Abstractions

October 15, 2022

Artificial Intelligence as a well-defined mathematical problem was solved a number of years ago through the formulation of the AIXI agent by Prof Marcus Hutter — see https://theconversation.com/to-create-a-super-intelligent-machine-start-with-an-equation-20756 for a quick introduction — but a key fundamental issue with the AIXI theory has always been the incomputability of the general solution. In a continuation of … More A Direct Approximation of AIXI using Logical State Abstractions

Bayesian Filtering on Structured Environments

June 12, 2022

A few colleagues and I have just completed a new research paper titled Factored Conditional Filtering: Tracking States and Estimating Parameters in High-Dimensional Spaces. The research took over 3 years and I am really excited about the underlying theory and its possible applications. In particular, the paper shows how we can lift’ Bayesian filtering to … More Bayesian Filtering on Structured Environments

A Note on Large Scale Data Matching and Entity Resolution

April 6, 2022

Data matching and entity resolution is a common first step in data preparation and there is a thousand academic papers written on the subject in the literature. In practice, for large datasets – anything more than a million records will do as a definition of large here because most data-matching algorithms can’t handle that because … More A Note on Large Scale Data Matching and Entity Resolution

Private Graph Data Release using Differential Privacy

July 13, 2021

A few colleagues and I have just put on arXiv a new survey paper on Private Graph Data Release, which took us nearly 9 months to write. Here’s the abstract: The application of graph analytics to various domains have yielded tremendous societal and economical benefits in recent years. However, the increasingly widespread adoption of graph … More Private Graph Data Release using Differential Privacy

Unsupervised 3D Object Segmentation

June 13, 2021

One of my PhD students has just released a paper titled Spatially Invariant Unsupervised 3D Object Segmentation Using Graph Neural Networks. Here’s the abstract: In this paper, we tackle the problem of unsupervised 3D object segmentation from a point cloud without RGB information. In particular, we propose a framework, SPAIR3D, to model a point cloud … More Unsupervised 3D Object Segmentation

Machine Learning: A Broad Church

September 1, 2020

I am sometimes asked what is the difference between Machine Learning (ML) and X, where X is one of a number of things like Statistics, Evolutionary Computing, Control Theory, etc. A variation of the question is what are problem classes that can be tackled by both ML and non-ML techniques, and what are the pros … More Machine Learning: A Broad Church