The Education of a Data Scientist: On Sands and Other Irritants

I have learned over the years to distinguish between good data scientists and great data scientists in the way they handle the seemingly mundane aspects of data analysis, tasks like loading large but poorly structured datasets, dealing with missing data or poor quality data, finding the right way to interrogate and transform variables to satisfy the implicit assumptions of statistical inference algorithms, writing clean code to do boring things like cross-validation and testing, and engaging in the right level of corporate kung fu to make sure their statistical models don’t die in Powerpoint presentations.

You know, the little irritants of a data scientist’s life that take them away from the “real work” of doing sophisticated statistical modelling using the latest and greatest machine learning / artificial intelligence techniques.

To those who prefer the sexy work over the mundane, allow me to counsel with a Robert Service quote: “Be master of your petty annoyances and conserve your energies for the big, worthwhile things. It isn’t the mountain ahead that wears you out – it’s the grain of sand in your shoe.”

If one remains unconvinced, here’s another sand analogy, this time drawing inspiration from Jon Bentley’s Programming Pearls, that is quite fitting here. Here’s an excerpt from an article from, modified with a bit of artistic license:

“A natural pearl begins its life inside an oyster’s shell when an intruder, such as a grain of sand, slips in between one of the two shells of the oyster and the protective layer that covers the mollusk’s organs, called the mantle.

In order to protect itself from irritation, the oyster will quickly begin covering the uninvited visitor with layers of nacre — the mineral substance that fashions the mollusk’s shells. Layer upon layer of nacre, also known as mother-of-pearl, coat the grain of sand until the iridescent gem is formed.”

As Bentley put it, just as natural pearls grow from grains of sand that irritate oysters, his programming pearls have grown from real problems that have irritated real programmers. In a similar fashion, there is a lot of data science pearls waiting to be discovered and harvested.

I will finish with yet another sand quote, this time from Jean-Paul Sartre: “The more sand that has escaped from the hourglass of our life, the clearer we should see through it.”





Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s