After a long bull market, most investors are sitting on healthy gains and a fair few are probably twiddling their thumbs wondering what to do next, if anything. There is plenty of talks about an imminent crash. CAPE valuations are high, very high. VIX is low, very low. The only reasonable way to justify current … More Hanging to Your Gains in a Lofty Market
My colleagues and I have just published on arXiv a simple but highly effective Entity Resolution algorithm that can scale to billions of records and handle significant data quality issues. The paper is titled Scalable Entity Resolution Using Probabilistic Signatures on Parallel Databases and it is an extension of our previous paper on linking millions of addresses … More Scalable Entity Resolution Using Probabilistic Signatures on Parallel Databases
There are many gems in the book Team of Teams: New Rules of Engagement For a Complex World, written by General Stanley McChrystal and co-authors based on McChrystal’s experience leading the Joint Special Operations Task Force in Afghanistan and Iraq fighting Al-Qaeda. One that stuck with me is a small section called The Need-To-Know Fallacy. … More The Need-To-Know Fallacy
Meet Bailey from Queanbeyan. It’s a bit embarrassing now but, twelve months ago, I was an insecure pup and often cry and chew whenever the adults are away. My coat was in pretty patchy condition too. Dad likes to show me these photos to remind me of my humble beginnings. But luckily for me Aunt … More One Year of Bailey Baloney
Many of my most successful data science projects happen by accident. You know, the little skunkworks that arose from a serendipitous hallway conversation where an important and urgent business problem meets a half-baked analytical idea. With a suitable dash of data and the right mix of office politics and corporate kung fu, a baby data-science … More Agile Data Science: On Opportunism
Having spent nearly a decade studying the design and implementation of declarative programming languages in a previous life, I get a bit frustrated whenever I see people getting religious about programming languages and platforms. In the data science circle, an active discussion is around Scala (on Spark) vs SQL (on parallelised relational databases). They are … More The Missing Data Science Language?