The life of a sea squirt has an important lesson for data science.
For those who haven’t heard, sea squirts come to life as larvae that swim freely around. In that state, however, they are not capable of feeding so they will soon settle to the bottom of the ocean and cement themselves headfirst to a spot where they will spend the rest of their lives. On settling down, the squirt digests its primitive eye, its spine-like notocord, and, finally, its rudimentary little brain! (More about sea squirt.)
We only need a brain if we have to act.
I first heard about the story of sea squirts in a talk by Prof Daniel Wolpert, where he argued convincingly that the brain is Bayesian. It’s a story I have been telling myself and others over and over again because I think it has an important lesson for data science.
Loosely speaking, there are two types of data science activities:
- Knowledge discovery activities (e.g. customer segmentation, customer lifetime value analysis, trade volume forecasting, etc)
- Action-optimisation activities (e.g. personalised advertising, error diagnosis, process optimisation, etc)
Many organisations start their data science journey by putting an open-ended knowledge discovery question to their (budding) data scientists: What are some (new) insights we can draw about our business using big-data techniques and technologies? It sounded like a reasonable question but I think it’s the wrong place to start because the question is a loaded one that can easily lead to analysis paralysis and loss of credibility for the data science team for the following reasons:
- Knowledge discovery is an open-ended process that cannot be easily time-boxed; insights will usually invite more questions and qualifications, ad infinitum, without leading to concrete measurable business benefits. (Management are very good at asking increasingly detailed questions.)
- It’s high risk because it can pit the (usually new) data science team against corporate old-timers with their domain knowledge and possibly biases. It doesn’t help that the expectation is usually high that the data science team will produce something truly new; such is the hype surrounding data science.
- The data science team can become disillusioned if their hard work don’t lead to business results and most of their data insights die silently on Powerpoint presentations. (Most insights one can get from a pure discovery exercise are interesting but non-actionable.)
Many a new data science team have fallen victim to this little trap and faded quickly into corporate irrelevance…
As the sea squirt story illustrates, the reason for having a “corporate brain” is to act. I firmly believe the right approach for an organisation to start its data science journey is to start with a business action it wants to take and ask how data science can be used to act (more) optimally. In particular, this is the principle I use.
- Always start with action-optimisation activities (prioritised by risk-adjusted business values) and restrict the conduct of knowledge-discovery activities in the context of action-optimisation activities.
Why does this work better?
- Having a project with concrete business goals and actions will facilitate stakeholder buy-in from the word go. The success or otherwise of such data science work is also usually easy to measure.
- There’s usually a deadline in which to act so the project will be time-boxed. A time-box is a way to ensure quick-wins and fast-fails.
- All discovery activities are targetted and contextualised; the risk of “analysis paralysis” is minimised.
I have yet to see a data science project that started with a question on how to optimise a business action fail, certainly not by being irrelevant.