# Private ML marketplaces

## Fixing tradeoffs between various private ML strategies

Objectives

• Transaction additional data between model and data owner.
• Fairly price the transaction
• Preserve model and data details

### Introduction

Model owners want further improvements with additional trainind data, and data owners want to be compensated fairly. We discuss various approaches previously proposed, including smart contracts, data encryption, transformation, and approimation, and federated learning. We propose a model-data efficacy approach based on model approximation, and give an example using Model Extraction.

### Protecting models with Homomorphic Encryption

Consider the inference process $I$ with respect to model $T$. Encrypting all operations conceals the model; $\mathcal{H}(I_T)$ can perform inference on the data and updates on the model. A Fully-homomorphic encruption $\mathcal{H}$ on $I$ preserves the compiuational correctness without revealling model details, at the expense of efficiency:

$\mathcal{H}(I_T)(D)=I_T(D)$

Additionally a scaling function on $I_T(D)$ ca be overlaid to facilitate fair pricing and secure transaction:

$\mathcal{H}(P(I_T(D)))=P(I_T(D))$

Yet the encryption and computation are too slow to be practival.

Figure 1: Federated learning: distributed learning with differential privacy guarantees, especially useful for simple models and many users.

### Problem Setup

• $T$ is a trained model with parameter $\Theta$ owned by the $model$ owner. Details regarding $T$ and $\Theta$ are valuable
• Data Owner own additional training Data, $D$, that may improve $T$. Data owner wants to protect data details, lest they be shared.
• $\Delta \Theta_T(D)$, the resulting update, proxies benefits $T$ gets from additional training data D.

### Data: Encrypt or Approximate

$D' ~ D \mid \Delta \Theta_T(D') ~ \Delta \Theta_T(D)$
• for compliance, performing pure inference, but
• not private when facing black-box models if the updates are visible; requires customized networks.

### Federation: Gains and Losses

• Distributed, collaborative learning.
• Differentially private update aggregation.
• Complicated setup; on-device training escpecially useful for simple models); customized protocol design and optimization for integrating classifiers.
• Requires having many users to be private (suing random rotation, etc to ensure privacy

### Model approximation

$T' ~ T \mid \ \Delta \Theta_T(D') ~ \Delta \Theta_T(D)$

Figure 3: A Pricing Function $P(T') : D \longrightarrow \mathbb{R^+}$ is composed

Data: black-box $T$, $\Theta$, $\text{MDE } f$, data $D_{train}$, $D_{test}$, additional data $D$, ideal model size $t$. Result: Price $D$ w.r.t. $T$ Let $T' \leftarrow f(T)$ // learn a decision tree in [2] while not $(∀d^{[i]}∈ D_{train}\Delta \mathcal{L}_{test}(d^{[i]}, T)$ do

$\Theta \leftarrow \Theta + \Delta \Theta_T (d)$

end while $sizeof(T') > t$ do

trim or compress $T'$ // for optional encryption

end Algorithm 1: Model extraction as MDE that Draws properties on any model, Black boxes can be handled in escrow. Applies to interpretability and model testign. Trades accuracy for size, Encrypt if tiny.

### Discussion

• Data that is useful can be priced, and vice versa.
• Due to mismatch in representation between training data and test data e.g., insfficient data, duplicate data may be priced for reducing error.

Solution: $∀d^{[i]}∈ D_{train}\Delta \mathcal{L}_{test}(d^{[i]}, T) < \epsilon$. That is overfit to $T$ with $D_{train}$ until the resulting approximation does not price duplicate data.

### Conclusion

For trading additional trading data fairly and practically, we introduce Mode-Data Efficacy approaches, based on model approcimation of black-box models, that prices the data without training it on the original model.

Approximating the effect of data on the model through model approximation (Model-Data Efficacy) is a moderately practical solution to preserve model and data privacy. Model extraction, for example, can be used for fair pricing. That is, useless data can be priced minimally whil useful data can be priced high.

Approach $D$ Leakage $T$ leakage Practicality Fairness Examples
Giving up data High Low High Low Default ML
Giving up model Low High High Low Academic Researchers
Escrow smart contract Medium Medium Low High Numerai, Enigma
Encrypting the Model High Low Low N/A Corti, PySyft
Encrypting the Data Medium Low Low Medium Microsoft SEAL
Federated Learning Low Low Low High Google (for Android Data)
Model-Data Efficacy Low Low Medium High DeMoloch

Against black-box models, eencryptiong or approximating data have flaws regarding privacy. While federated learning with ddifferential privacy achieves privacy for both model owner and data owner, it is less practical for one-time transactions.

### Future work

• Pre-training data synthesize from existing $D_{train}$ eliminates tuning on $D_{test}$ and refunes $usefulness$ into a metric for $novelty$
• Stronger transactional security against adversarial attacks against the model owner

### References

1. Aono, Yoshinori, et al. “Privacy-preserving deep learning via additively homomorphic encryption.” IEEE Transactions on Information Forensics and Security 13.5 (2017): 1333-1345.
2. Bastani, Osbert, Carolyn Kim, and Hamsa Bastani. “Interpretability via model extraction.” arXiv preprint arXiv:1706.09773 (2017).

Subscribe to know whenever I post new content. I don't spam!

#### At least this isn't a full screen popup

That would be more annoying. Anyways, if you like what you're reading, consider subscribing to my newsletter! I'll notify you when I publish new posts - no spam.