Agile Data Science: A Portfolio Approach to Managing An Analytics Team

I recently concluded a two-year stint managing a team of ten highly skilled analytics professionals spread across three different locations. There were of course many challenges but the team over-achieved on just about every measure of success one can imagine. The team’s wins include completing on-time and under-budget a data-matching project that delivers tens of … More Agile Data Science: A Portfolio Approach to Managing An Analytics Team

Ariely’s Mental Model on Dishonesty

Dan Ariely is always worth reading and I picked up The (Honest) Truth About Dishonesty over the Christmas break and it did not disappoint. The key findings of Ariely’s work in this area are summarised in the following diagram, which lists some of the factors and forces that are shown through experimental studies to shape dishonesty, both … More Ariely’s Mental Model on Dishonesty

Scalable Entity Resolution Using Probabilistic Signatures on Parallel Databases

My colleagues and I have just published on arXiv a simple but highly effective Entity Resolution algorithm that can scale to billions of records and handle significant data quality issues. The paper is titled Scalable Entity Resolution Using Probabilistic Signatures on Parallel Databases and it is an extension of our previous paper on linking millions of addresses … More Scalable Entity Resolution Using Probabilistic Signatures on Parallel Databases