Having spent nearly a decade studying the design and implementation of declarative programming languages in a previous life, I get a bit frustrated whenever I see people getting religious about programming languages and platforms. In the data science circle, an active discussion is around Scala (on Spark) vs SQL (on parallelised relational databases). They are … More The Missing Data Science Language?
Everything that is old is new again. That’s the feeling I get when I look at Spark, which I learned is one of the fastest growing Apache projects in the big data space. There is remarkable similarity in the underlying architecture between Spark and that of a Massively Parallel Processing (MPP) Database like Greenplum or … More Apache Spark vs MPP Databases