Large-scale Subscriber Preference Modelling for Telcos – Part 2

In Part 1 of this blog article, we looked at the problem of tokenising a URL as an intermediate step towards learning user preference models from browsing histories. In Part 2, we next look at the problem of learning a URL classifier model from the preprocessed Shalla dataset using Support Vector Machines. A standard way … More Large-scale Subscriber Preference Modelling for Telcos – Part 2

Large-scale Subscriber Preference Modelling for Telcos – Part 1

An important way telcos can increase revenue is to improve, within the constraints of privacy laws, provision of personalised services for subscribers. To achieve that, they need to be able to build good subscriber preference models. These can take a number of forms, depending on the specific business context and the exact data available. In this … More Large-scale Subscriber Preference Modelling for Telcos – Part 1

Automatic Data Integration using Normalised Compression Distance

Whenever two organisations come together to share data, we have a data integration problem. Mapping of datasets is typically done manually and that can be a labour-intensive and error-prone process. Importantly, the manual data-mapping process doesn’t scale and that is a problem when you want to build an information-sharing network where arbitrary organisations can sign … More Automatic Data Integration using Normalised Compression Distance

Privacy Preserving Outlier Detection: A Tutorial

Outlier detection is an important tool in risk modelling. In the context where data are distributed across multiple locations and data privacy is a concern, we need to start looking at privacy-preserving techniques for doing outlier detection. Linked here is a tutorial introduction to this topic I recently prepared. Privacy Preserving Outlier Detection The presentation … More Privacy Preserving Outlier Detection: A Tutorial