Unifying Logic and Probability for Learning

Unifying logic and probability is an active and ongoing research topic of great interest to many. There are many proposals of probabilistic logics in the literature, each with a different motivation, either computational or philosophical, and a different system of syntax and semantics. This state of affairs is confusing and not satisfactory, especially in view … More Unifying Logic and Probability for Learning

The Competitive Moat of Google, Facebook and Other Data Owners

When I was building the Data Science Centre of Excellence at Reliance Industries, I once interviewed a world-class researcher who has worked at multiple institutions, including Yahoo and Google, and he told me something I didn’t understand till then. I thought a company like Google has a weak business moat because it is vulnerable to … More The Competitive Moat of Google, Facebook and Other Data Owners

A Short Course on Statistical Learning

Here is a short (and somewhat unusual) course on statistical machine learning that I have delivered multiple times over the last few years. Introduction to Statistical Learning Theory Bayesian Probability Theory Sequence Prediction and Data Compression Bayesian Networks In designing this course, I have deliberately steered away from the usual practice of giving students a (long) … More A Short Course on Statistical Learning

How to Prove It

A major deficiency in many university-level computer science programs is neglect for training in fundamental mathematical skills. This deficiency usually rears its head when a CS student first move into an area like Data Science and quickly realise s/he does not even have the ability to fully understand papers and books in the field, let alone contribute … More How to Prove It

Online Support Vector Machines

I have been studying and experimenting with online learning algorithms for support vector machines (SVMs) for a while now, primarily with the intention of understanding how they can be used to learn SVM models on large multi-terabyte datasets. The following technical report describes the NORMA and PEGASOS family of algorithms and give some observations and relevant … More Online Support Vector Machines

Quantifying the Accuracy of Business Rules

Telcos everywhere are working on initiatives to better monetise their data. For many of them, a key challenge in addressing customer requirements is lack of labelled data. For example, a customer may come along and make a request: “Tell me something about the shopping behaviour of housewives in the country”. This seemingly simple question is actually … More Quantifying the Accuracy of Business Rules