In-Database Machine Learning Illustrated

I have just received the excellent news that Apache MADlib, a big data machine learning library for which I was a committer until recently, has graduated to become a top-level Apache project. The basic idea behind MADlib is actually quite interesting and deserves to be more widely known. Massively Parallel Processing (MPP) databases like Greenplum have … More In-Database Machine Learning Illustrated

Online Support Vector Machines

I have been studying and experimenting with online learning algorithms for support vector machines (SVMs) for a while now, primarily with the intention of understanding how they can be used to learn SVM models on large multi-terabyte datasets. The following technical report describes the NORMA and PEGASOS family of algorithms and give some observations and relevant … More Online Support Vector Machines