Data Science – Page 6 – Mental Models 4 Life

Agile Data Science: The Point of It All

November 29, 2016

Slightly less than 5 years ago, I had the fortune of being hauled in front of the then richest man in Asia and asked to talk about data science. That was the very first time I had met a billionaire in person so it was kind of exciting. Two things happened: I gave my worst … More Agile Data Science: The Point of It All

Large-scale Subscriber Preference Modelling for Telcos – Part 2

November 5, 2016

In Part 1 of this blog article, we looked at the problem of tokenising a URL as an intermediate step towards learning user preference models from browsing histories. In Part 2, we next look at the problem of learning a URL classifier model from the preprocessed Shalla dataset using Support Vector Machines. A standard way … More Large-scale Subscriber Preference Modelling for Telcos – Part 2

Large-scale Subscriber Preference Modelling for Telcos – Part 1

October 30, 2016

An important way telcos can increase revenue is to improve, within the constraints of privacy laws, provision of personalised services for subscribers. To achieve that, they need to be able to build good subscriber preference models. These can take a number of forms, depending on the specific business context and the exact data available. In this … More Large-scale Subscriber Preference Modelling for Telcos – Part 1

Apache Spark vs MPP Databases

August 14, 2016

Everything that is old is new again. That’s the feeling I get when I look at Spark, which I learned is one of the fastest growing Apache projects in the big data space. There is remarkable similarity in the underlying architecture between Spark and that of a Massively Parallel Processing (MPP) Database like Greenplum or … More Apache Spark vs MPP Databases

A Derivation of the Kalman Filter

July 24, 2016

The Kalman Filter is one of the more useful tools in data science, but while there are a lot of well-written descriptions of the Bayesian tracking technique available online and in technical books/articles, for some reason it’s hard to find a simple derivation of the Kalman Filter from first principles. In this short note, I show how … More A Derivation of the Kalman Filter

Agile Data Science: Start with An Action

May 28, 2016

The life of a sea squirt has an important lesson for data science. For those who haven’t heard, sea squirts come to life as larvae that swim freely around. In that state, however, they are not capable of feeding so they will soon settle to the bottom of the ocean and cement themselves headfirst to … More Agile Data Science: Start with An Action

Data Acquisition for Shopping Research

April 3, 2016

Have you noticed more and more shopping malls are offering free wifi? That appears at first sight a puzzling investment, especially in this day and age where everyone has data subscription on their smart phones. But then you start noticing that it’s quite hard to get good signal coverage from your telcos inside those malls that offer … More Data Acquisition for Shopping Research

Building a Data Science Practice

March 28, 2016

I spent five years working as a consulting data scientist for EMC/Pivotal. In that time, I have had a chance to help several organisations in Asia Pacific set up their data science practices. Most of these organisations are large traditional enterprises with entrenched corporate practices. In each case, I must admit I wasn’t as successful … More Building a Data Science Practice

Privacy Preserving Outlier Detection: A Tutorial

March 4, 2016

Outlier detection is an important tool in risk modelling. In the context where data are distributed across multiple locations and data privacy is a concern, we need to start looking at privacy-preserving techniques for doing outlier detection. Linked here is a tutorial introduction to this topic I recently prepared. Privacy Preserving Outlier Detection The presentation … More Privacy Preserving Outlier Detection: A Tutorial