# Private ML marketplaces

## Fixing tradeoffs between various private ML strategies

| UPDATED

### Introduction

One of the most exciting areas in machine learning right now is Private Machine learning. At its core, private machine learning is concerned with balancing two competing problems:

**ML model owners and developers want further improvements with additional trainined data.**This is what machine learning engineers do for a living. This is the main strategy for AI-as-a-service (AIaaS) companies. They develop a model based on some sort of data, and then use the proprietary model in some sort of product.- Data owners want the data to be used fairly. If they do not directly benefit from the model, or if they would face some sort of risk in revealing the data for free, they have no incentive to cooprate with the ML model owners. In other words,
**the data providers want to be compensated fairly**.

This post summarizes and discusses some of the various approaches previously proposed in this space (e.g., smart contracts, data encryption, transformation, and approimation, and federated learning). In addition, thi post proposes a way to address this balance using a model-data efficacy approach based on model approximation (and give an example using Model Extraction). This approach has 3 main objectives:

```
1. Transaction additional data between model and data owner.
2. Fairly pricing the transaction
3. Preserving the model and data details
```

### Problem Setup

To define a common nomenclature throughout this post, let’s define the problem in more detail beyond our two points above:

- $T$ is a trained model with parameter $\Theta$ owned by the $\text{model}$-owner.
- Details regarding $T$ and $\Theta$ are valuable.
- The $\text{data}$-owner(s) own additional training data, $D$, that
*may or may not*improve $T$. - The $\text{data}$-owner(s) wants to protect data details, lest they be shared.
- $\Delta \Theta_T(D)$, the resulting update, is a proxy for the benefits $T$ gets from additional training data $D$.

These are the main properties shared among all the approches explored, and each approach’s pros and cons can be framed in terms of this problem setup.

### Protecting models with Homomorphic Encryption

The AIaaS approach is dependent on some sort of model staying proprietary. For example, many drug discovery companies publish high-level details of their models as press releases, but few of those documents contain implementable details. One of the proposed approaches for protecing these models in use-cases like on-device machine learning is homomorphic encryption.

Consider the inference process $I$ with respect to model $T$, $I_T$. Homomorphic encryption takes the operations that make up the model, and maps them into a seperate but analogous algebraic group (i.e., a homomorphism). Encrypting all operations like thisconceals the model; $\mathcal{H}(I_T)$ can perform inference on the data ($D$) and updates on the model. A fully-homomorphic encription $\mathcal{H}$ on $I$ preserves the compiuational correctness without revealling model details, at the expense of efficiency: $\mathcal{H}(I_T)(D)=I_T(D)$ Additionally a scaling function on $I_T(D)$ can be overlaid to facilitate fair pricing and secure transaction (i.e., the OpenMined protocol approach):

$\mathcal{H}(P(I_T(D)))=P(I_T(D))$In principle, this all sounds pretty useful. The caveat? The encryption and computation are still too slow to be practival. Outside of being used for incredibly simple models, this would likely be dependent on foundational advances in encryption and possibly specialized ASICs.

### Data: Encryption or Approximation

Homomorphic encryption can also work the other way: mapping the data itself into some other algrbraic group using encryption. Unlike the case of model operation encryption, we also have the option of substituting with an approximation of the data (i.e., Differential privacy):

$D' \sim D \mid \Delta \Theta_T(D') \sim \Delta \Theta_T(D)$In cases where compliance is a concern, differential privacy and/or homomorphically-encrypted data are usually seen as the ideal. Unlike the model-encryption, this approach is also performing pure inference.

However, the privacy this approach offers for black-box models tends to break down if the model updates are visible. If one can track the updates to the model, even if the training is distributed, one can easily reconstruct the data. The approximation strategy, while technically simpler than the encryption, also requires specialized network architectures that are custom-built for the task (e.g., Cleverhans’ PATE).

### Federated Machine Learning

One of the more popular approaches to separation of models and data is Federated Machine learning. This is a training strategy that features distributed, collaborative learning across multiple nodes in a network, with one or multiple models being updated in pieces. It is even possible to combine the update aggregation step with differential privacy. This one has already been deployed in real-world products, such as in Android devices sharing location data with models designed to learn traffic patterns.

It’s not a free lunch, though. Out of all the approaches described so far, this may be one of the most complicated to set up. Even without the differential privacy on top, integrating multiple classifiers and regressors necessitates customized protocol design and optimization. There’s also a reason why it’s rarely used outside of large companies like Alphabet: It requires many *many* users for privacy to be enforced (using algorithms like random rotation, for example, which cannot be done effectively for privacy purposes with less than a hundred users).

### Model approximation

Model approximation is often researched in terms of it’s security risks. There are plenty of categories of attacks that can be used to steal information on black-box models, given access to information like the state of the last layer logits. However, model approximation also provides an opportunity to address the data pricing problem. The following relationship details how we price the data at a high level.

$T' \sim T \mid \ \Delta \Theta_T(D') \sim \Delta \Theta_T(D)$Below is the pseudocode for using model-extraction as a Model-Data-efficacy (MDE) strategy. This approach can draw properties on any model. Black-box-models can be handled in escrow.

**Data:**

- black-box $T$
- $\Theta$
- $\text{MDE } f$
- data $D_{train}$
- data $D_{test}$
- additional data $D$
- ideal model size $t$.

**Algorithm:**
Let $T' \leftarrow f(T)$ // learn a decision tree in [2]
**while** *not* $(∀d^{[i]}\in D_{\text{train}}\Delta \mathcal{L}_{\text{test}}(d^{[i]}, T)$ **do**

$\Theta \leftarrow \Theta + \Delta \Theta_T (d)$ // Train with $d \in D_{\text{train}}$

**end**
**while** $\text{size of}(T') > \tau$ **do**

$\text{trim or compress }T'$ // for optional encryption

**end**

**Result:**

- Price of $D$ w.r.t. $T$

Some of the properties of this approach are that it can apply to both interpretability and model testing. The pricing algorithm trades accuracy for size. If the model is sufficiently tiny, it can be encrypted as well.

### Extension to Data Marketplaces

So, we’ve discussed a variety of approaches to the tradeoff problem described at the beginning. On the plus side we can clearly make a case that Data that is useful can be priced, and vice versa.

One downside of this approach is that due to mismatches in representation between training data and test data (e.g., insfficient data), we could easily end up with a market in which duplicate data may be priced for reducing error.

### Solution

$∀d^{[i]}∈ D_{\text{train}}\Delta \mathcal{L}_{\text{test}}(d^{[i]}, T) < \epsilon$.

That is, we can overfit to $T$ with $D_{\text{train}}$ until the resulting approximation does not price duplicate data.

In summary, for trading additional trading data fairly and practically, Mode-Data Efficacy approaches based on model approximation of black-box models can be used. More specifically, these approaches price the data without training it on the original model.

*Approximating the effect of data on the model through model approximation (Model-Data Efficacy) is a moderately practical solution to preserve model and data privacy. Model extraction, for example, can be used for fair pricing. That is, useless data can be priced minimally whil useful data can be priced high.*

Approach | $D$ Leakage | $T$ leakage | Practicality | Fairness | Examples |
---|---|---|---|---|---|

Giving up data | High | Low |
High |
Low | Default ML |

Giving up model | Low |
High | High |
Low | Most Academic Researchers |

Escrow smart contract | Medium | Medium | Low | High | Numerai, Enigma |

Encrypting the Model | High | Low |
Low | N/A | Corti, PySyft |

Encrypting the Data | Medium | Low |
Low | Medium | Microsoft SEAL |

Federated Learning | Low |
Low |
Low | High |
Google (for Android Data) |

Model-Data Efficacy |
Low |
Low |
Medium |
High |
DeMoloch |

Against black-box models, eencryptiong or approximating data have flaws regarding privacy. While federated learning with differential privacy achieves privacy for both model owner and data owner, it is less practical for one-time transactions.

### Future work

There are still plenty of ways to refine this approach. For example, pre-training data synthesized from existing $D_{\text{train}}$ would eliminate tuning on $D_{\text{test}}$. This would also have the added benefit of refining $\text{usefulness}$ into a metric for novelty.

As with any ML approach (especially one in which markets would be involved), there will need to be some kind of security against adversarial attacks against the model owner. From the ML perspective, there are a variety of tools for adding adversarial robustness (e.g., Cleverhans, Mr. Ed, model pruning). One of the best forms of security, however, would be to add some sort of transactional security that makes adversarial attacks prohibitively expensive to carry out in most instances.

### References

- Aono, Yoshinori, et al. “Privacy-preserving deep learning via additively homomorphic encryption.”
*IEEE Transactions on Information Forensics and Security*13.5 (2017): 1333-1345. - Bastani, Osbert, Carolyn Kim, and Hamsa Bastani. “Interpretability via model extraction.”
*arXiv preprint arXiv:1706.09773*(2017).