How to Link Millions of Addresses with Ten Lines of Code in Ten Minutes

Solving big hairy problems like detecting complex financial crimes requires solving a series of smaller, mundane but technically non-trivial problems. Performing efficient record linkage on large databases with tens to hundreds of millions of rows of data is one such pesky problem. A few of my colleagues have just made a small dent on the overall … More How to Link Millions of Addresses with Ten Lines of Code in Ten Minutes

From Words to Concepts: Explicit Semantic Analysis

Everyone with a rudimentary understanding of text analytics knows about term-frequency-inverse-document-frequency (TF-IDF) vectors. What is less known but deserve to be more widely appreciated and applied is a little related trick called Explicit Semantic Analysis (ESA) introduced by E. Gabrilovish and S. Markovitch, for which they were awarded the 2014 IJCAI-JAIR Best Paper Prize. (https://www.jair.org/bestpaper.html) ESA … More From Words to Concepts: Explicit Semantic Analysis