A Note on Large Scale Data Matching and Entity Resolution

Data matching and entity resolution is a common first step in data preparation and there is a thousand academic papers written on the subject in the literature. In practice, for large datasets – anything more than a million records will do as a definition of large here because most data-matching algorithms can’t handle that because … More A Note on Large Scale Data Matching and Entity Resolution

Private Graph Data Release using Differential Privacy

A few colleagues and I have just put on arXiv a new survey paper on Private Graph Data Release, which took us nearly 9 months to write. Here’s the abstract: The application of graph analytics to various domains have yielded tremendous societal and economical benefits in recent years. However, the increasingly widespread adoption of graph … More Private Graph Data Release using Differential Privacy

Unsupervised 3D Object Segmentation

One of my PhD students has just released a paper titled Spatially Invariant Unsupervised 3D Object Segmentation Using Graph Neural Networks. Here’s the abstract: In this paper, we tackle the problem of unsupervised 3D object segmentation from a point cloud without RGB information. In particular, we propose a framework, SPAIR3D, to model a point cloud … More Unsupervised 3D Object Segmentation

Understanding Industry Structure and Competition

While going through some old documents this week, I rediscovered a set of notes I took while reading Michael Porter many years ago. On rereading the notes, I find (again) his ideas to be really simple and compelling, worthy of sharing as a mental model for both practising data scientists and a value investors. A … More Understanding Industry Structure and Competition