Outlier detection is an important tool in risk modelling. In the context where data are distributed across multiple locations and data privacy is a concern, we need to start looking at privacy-preserving techniques for doing outlier detection. Linked here is a tutorial introduction to this topic I recently prepared.
The presentation builds on a few resources, including
- Du & Atallah, Privacy-Preserving Cooperative Statistical Analysis, 2001
- Vaidya Y Clifton, Privacy-Preserving Outlier Detection, 2004.
I guess the key takeaways are these:
- Privacy-preserving (PP) statistical algorithms can appear difficult to appreciate at first because they sit at the intersection between data science and cryptography, both substantial topics in their own right. However, I find that once we have a good handle on a few primitives (oblivious transfer, secure scalar product, secure comparison, etc), many PP algorithms become relatively easy to understand and implement.
- There are now practical PP algorithms for a range of problems so data science practitioners should really start paying attention.
- The foundational technologies behind PP algorithms, including the important Secure Multi-party Computation problem, are now increasingly being used in conjunction with the Blockchain technique to produce potentially disruptive technologies like the Enigma system from MIT.