### Privacy Preserving Support Vector Machines: A Simple Version

Suppose we have two entities P1 and P2 and P1 holds a training dataset $D1 := \{ (x_i, y_i) \}_{\{i=1...p\}}$ and P2 holds a dataset $D2 := \{ (x_i, y_i) \}_{\{i=1..m\}}$. (Assume for simplicity that the x’s all have the same dimension and each y is a real number. We’ll deal with the case when P1 and P2 measure different sets of variables in a different way later.)

Suppose further that we have a model learning algorithm A that takes in a training dataset and produces a model. (C4.5, Support vector machines, Elastic net, etc are examples of A.) The goal of privacy-preserving model learning in the context of A is to have an algorithm A’ that when jointly executed by P1 and P2 produces a model that would have been obtained from running A on $D1 \cup D2$ but in such a way that P1 never get to see any subset of D2 and P2 never get to see any subset of D1 in the course of the execution of A’.

Looking at the problem formulation, it looks like we can easily construct privacy-preserving versions of many online learning algorithms. One of the simplest and most famous online learning algorithm is of course the Perceptron so let’s look at that.

To achieve privacy-preserving model learning in the context of the Perceptron, we simply get P1 to run Perceptron on D1 to produce a weight vector w1, which is then passed to P2, whereupon it runs Perceptron on D2 but using w1 as the initial weight vector. We can go back and forth between P1 and P2 until the weight vector stabilises, at which point the final weight vector is broadcast to both P1 and P2. At all times, the only communication between P1 and P2 is a weight vector, from which one cannot reconstruct any part of D1 or D2. Further, the final weight vector is obtained as if we are running Perceptron on $D1 \cup D2$.

So that’s Perceptron. The simple scheme works for more modern algorithms like support vector machines (SVM) too. A state-of-the-art online SVM algorithm is Pegasos (Shalev-Shwartz et al, 2011). The primal form of Pegasos works in a similar way to Perceptron so can be handled easily. The dual form of Pegasos –you can see a quick description of the algorithm on Page 10 of this set of notes — is slightly more tricky to handle. In the dual form, the SVM model takes the form of a subset of the training data (these are the so-called support vectors.) For that reason, we cannot just pass the current model around to get a privacy-preserving version of the algorithm. The way to get around that is to “randomise” the current model before passing it on. Specifically, suppose $M1 \subset D1$ is the current model after running the algorithm on P1’s dataset, we will construct M1′ by taking each $(x,y) \in M1$ and randomly construct $\{ (x'_1,y), (x'_2,y), \ldots, (x'_k,y) \}$ in such a way that $x = \sum_{i} x'_i$ and passing M1′ (instead of M1) to P2 for subsequent computation. Because of the linear structure of the model, computing with M1’ gives the same result as computing with M1 from P2’s perspective, therefore giving us the privacy-preserving property again.

Note: Randomising the support vectors doesn’t work beyond the linear case; general Mercer kernels don’t have the linear property.

Note: The above talks about 2 parties but the algorithms obviously generalises to multiple parties.

Note: Another happy note is that Pegasos can be used for classification, regression, and outlier detection, giving us one sledgehammer to use for many different types of problems.

Question: How is this different from getting P1 and P2 to run, say, Pegasos on their individual datasets to obtain two models H1 and H2 and then do appropriate model averaging?

Answer: One way to combine H1 and H2 is to construct a new model $H(x) := \alpha H1(x) + (1-\alpha)H2(x)$, and we need some information about the datasets D1 and D2, as well as the accuracies of H1 and H2 on them to get principled estimates of $\alpha$. The privacy-preserving scheme finesses this whole problem altogether.

There’s a lot of other privacy-preserving algorithms out there. Basically, once people worked out how to compute inner products in the secure multi-party computation setting, the problem of designing privacy-preserving machine learning algorithms is largely solved because — thanks to our 25-year obsession with support vector machines and the kernel trick — most interesting machine learning algorithms can now be rewritten into a form that involves only inner product operations. (I’m being a bit flippant here and understating the significant research that has gone into the design of privacy preserving data mining algorithms.) The question I have is whether we need those more sophisticated algorithms, or do we only need something simpler like what I sketched out above?