Agile Data Science: Don’t Let Your Model Die in a Powerpoint Presentation

Most data science projects are doomed to failure before they even start. There are a couple of reasons. The aspiring data scientist and management may be drawn to a sexy problem rather than an important problem. The full range of data required to do a complete analysis may be inaccessible or even non-existent. And even … More Agile Data Science: Don’t Let Your Model Die in a Powerpoint Presentation

The Education of a Data Scientist: On Sands and Other Irritants

I have learned over the years to distinguish between good data scientists and great data scientists in the way they handle the seemingly mundane aspects of data analysis, tasks like loading large but poorly structured datasets, dealing with missing data or poor quality data, finding the right way to interrogate and transform variables to satisfy … More The Education of a Data Scientist: On Sands and Other Irritants

In-Database Machine Learning Illustrated

I have just received the excellent news that Apache MADlib, a big data machine learning library for which I was a committer until recently, has graduated to become a top-level Apache project. The basic idea behind MADlib is actually quite interesting and deserves to be more widely known. Massively Parallel Processing (MPP) databases like Greenplum have … More In-Database Machine Learning Illustrated

Setting up a Data Science Practice: Analytics Processes

In this third post on setting up a data science practice, I address some of the analytics processes that need to be in place to maximise value from analytics. After more than two decades of practice and development, there are now well- established data analytics frameworks like the Cross Industry Standard Process for Data Mining. … More Setting up a Data Science Practice: Analytics Processes

Setting up a Data Science Practice: People Dimension

In the previous post, we discussed the key principles of setting up a data science practice. In this post, we’ll discuss the people dimension. One should read the below as suggestions, not prescriptions. There is more than one way to set up a data science practice. Critical to the success of a data science practice are … More Setting up a Data Science Practice: People Dimension

Setting up a Data Science Practice: Fundamental Principles

I have been involved in the setup of several data science practices in both industry and government. Here are a few key principles I use in establishing a data science practice. Principle 1: Building a predictive enterprise is, first and foremost, about building a human infrastructure. Many companies mistakenly believe that analytics is primarily about software … More Setting up a Data Science Practice: Fundamental Principles

Customer Lifetime Value and Its Application in Retail Analytics

Customer Lifetime Value (CLV) is a relatively new framework stemming from the idea of “treatment of customers as an asset”, in use at innovative companies like Harrah’s, IBM, and Capital One. The definition is a fairly natural one: CLV is the net present value of profit from all the future purchases a customer is going to … More Customer Lifetime Value and Its Application in Retail Analytics