In-Database Machine Learning Illustrated

I have just received the excellent news that Apache MADlib, a big data machine learning library for which I was a committer until recently, has graduated to become a top-level Apache project. The basic idea behind MADlib is actually quite interesting and deserves to be more widely known. Massively Parallel Processing (MPP) databases like Greenplum have … More In-Database Machine Learning Illustrated

Large-scale Subscriber Preference Modelling for Telcos – Part 2

In Part 1 of this blog article, we looked at the problem of tokenising a URL as an intermediate step towards learning user preference models from browsing histories. In Part 2, we next look at the problem of learning a URL classifier model from the preprocessed Shalla dataset using Support Vector Machines. A standard way … More Large-scale Subscriber Preference Modelling for Telcos – Part 2

Large-scale Subscriber Preference Modelling for Telcos – Part 1

An important way telcos can increase revenue is to improve, within the constraints of privacy laws, provision of personalised services for subscribers. To achieve that, they need to be able to build good subscriber preference models. These can take a number of forms, depending on the specific business context and the exact data available. In this … More Large-scale Subscriber Preference Modelling for Telcos – Part 1