Extending the Paillier Cryptosystem to Handle Floating Point Numbers

The Paillier Cryptosystem is a partial homomorphic encryption scheme that supports two important operations: addition of two encrypted integers and the multiplication of an encrypted integer by an unencrypted integer. In practice, many applications of Paillier require an extension of the underlying scheme beyond integers to handle floating-point numbers. For example, just about every popular machine learning … More Extending the Paillier Cryptosystem to Handle Floating Point Numbers

The Education of a Data Scientist: On Sands and Other Irritants

I have learned over the years to distinguish between good data scientists and great data scientists in the way they handle the seemingly mundane aspects of data analysis, tasks like loading large but poorly structured datasets, dealing with missing data or poor quality data, finding the right way to interrogate and transform variables to satisfy … More The Education of a Data Scientist: On Sands and Other Irritants

How to Link Millions of Addresses with Ten Lines of Code in Ten Minutes

Solving big hairy problems like detecting complex financial crimes requires solving a series of smaller, mundane but technically non-trivial problems. Performing efficient record linkage on large databases with tens to hundreds of millions of rows of data is one such pesky problem. A few of my colleagues have just made a small dent on the overall … More How to Link Millions of Addresses with Ten Lines of Code in Ten Minutes

Detecting Financial Crimes: Current State, Limitations, and A Way Forward

Financial Intelligence Units (FIUs) around the world collect data like threshold transaction reports, international fund transfer reports, and suspicious matter/activity reports from Reporting Entities (REs), which include banks, money remitters, casinos, law firms, real-estate companies, and financial companies. They may also get data about entities of interest from partner agencies (PAs) like law-enforcement agencies (LEAs) … More Detecting Financial Crimes: Current State, Limitations, and A Way Forward

In-Database Machine Learning Illustrated

I have just received the excellent news that Apache MADlib, a big data machine learning library for which I was a committer until recently, has graduated to become a top-level Apache project. The basic idea behind MADlib is actually quite interesting and deserves to be more widely known. Massively Parallel Processing (MPP) databases like Greenplum have … More In-Database Machine Learning Illustrated

Setting up a Data Science Practice: Analytics Processes

In this third post on setting up a data science practice, I address some of the analytics processes that need to be in place to maximise value from analytics. After more than two decades of practice and development, there are now well- established data analytics frameworks like the Cross Industry Standard Process for Data Mining. … More Setting up a Data Science Practice: Analytics Processes

Setting up a Data Science Practice: People Dimension

In the previous post, we discussed the key principles of setting up a data science practice. In this post, we’ll discuss the people dimension. One should read the below as suggestions, not prescriptions. There is more than one way to set up a data science practice. Critical to the success of a data science practice are … More Setting up a Data Science Practice: People Dimension

Setting up a Data Science Practice: Fundamental Principles

I have been involved in the setup of several data science practices in both industry and government. Here are a few key principles I use in establishing a data science practice. Principle 1: Building a predictive enterprise is, first and foremost, about building a human infrastructure. Many companies mistakenly believe that analytics is primarily about software … More Setting up a Data Science Practice: Fundamental Principles

Customer Lifetime Value and Its Application in Retail Analytics

Customer Lifetime Value (CLV) is a relatively new framework stemming from the idea of “treatment of customers as an asset”, in use at innovative companies like Harrah’s, IBM, and Capital One. The definition is a fairly natural one: CLV is the net present value of profit from all the future purchases a customer is going to … More Customer Lifetime Value and Its Application in Retail Analytics

Lifting the Fog on Machine Learning Maths

Confession: As a computer scientist, I have always been comfortable with discrete mathematics. However, continuous maths, especially the type commonly seen in statistical machine learning, have always been a challenge for me. In fact, I lived through the last 15 years of my professional life in a more-or-less constant fog of partial understanding when it … More Lifting the Fog on Machine Learning Maths