Private ML marketplaces
Fixing tradeoffs between various private ML strategies
 UPDATED
Objectives
 Transaction additional data between model and data owner.
 Fairly price the transaction
 Preserve model and data details
Introduction
Model owners want further improvements with additional trainind data, and data owners want to be compensated fairly. We discuss various approaches previously proposed, including smart contracts, data encryption, transformation, and approimation, and federated learning. We propose a modeldata efficacy approach based on model approximation, and give an example using Model Extraction.
Protecting models with Homomorphic Encryption
Consider the inference process $I$ with respect to model $T$. Encrypting all operations conceals the model; $\mathcal{H}(I_T)$ can perform inference on the data and updates on the model. A Fullyhomomorphic encruption $\mathcal{H}$ on $I$ preserves the compiuational correctness without revealling model details, at the expense of efficiency:
$\mathcal{H}(I_T)(D)=I_T(D)$Additionally a scaling function on $I_T(D)$ ca be overlaid to facilitate fair pricing and secure transaction:
$\mathcal{H}(P(I_T(D)))=P(I_T(D))$Yet the encryption and computation are too slow to be practival.
Figure 1: Federated learning: distributed learning with differential privacy guarantees, especially useful for simple models and many users.
Problem Setup
 $T$ is a trained model with parameter $\Theta$ owned by the $model$ owner. Details regarding $T$ and $\Theta$ are valuable
 Data Owner own additional training Data, $D$, that may improve $T$. Data owner wants to protect data details, lest they be shared.
 $\Delta \Theta_T(D)$, the resulting update, proxies benefits $T$ gets from additional training data D.
Data: Encrypt or Approximate
$D' ~ D \mid \Delta \Theta_T(D') ~ \Delta \Theta_T(D)$ for compliance, performing pure inference, but

not private when facing blackbox models if the updates are visible; requires customized networks.
Federation: Gains and Losses
 Distributed, collaborative learning.
 Differentially private update aggregation.
 Complicated setup; ondevice training escpecially useful for simple models); customized protocol design and optimization for integrating classifiers.
 Requires having many users to be private (suing random rotation, etc to ensure privacy
Model approximation
$T' ~ T \mid \ \Delta \Theta_T(D') ~ \Delta \Theta_T(D)$Figure 3: A Pricing Function $P(T') : D \longrightarrow \mathbb{R^+}$ is composed
Data: blackbox $T$, $\Theta$, $\text{MDE } f$, data $D_{train}$, $D_{test}$, additional data $D$, ideal model size $t$. Result: Price $D$ w.r.t. $T$ Let $T' \leftarrow f(T)$ // learn a decision tree in [2] while not $(∀d^{[i]}∈ D_{train}\Delta \mathcal{L}_{test}(d^{[i]}, T)$ do
$\Theta \leftarrow \Theta + \Delta \Theta_T (d)$
end while $sizeof(T') > t$ do
trim or compress $T'$ // for optional encryption
end Algorithm 1: Model extraction as MDE that Draws properties on any model, Black boxes can be handled in escrow. Applies to interpretability and model testign. Trades accuracy for size, Encrypt if tiny.
Discussion
 Data that is useful can be priced, and vice versa.
 Due to mismatch in representation between training data and test data e.g., insfficient data, duplicate data may be priced for reducing error.
Solution: $∀d^{[i]}∈ D_{train}\Delta \mathcal{L}_{test}(d^{[i]}, T) < \epsilon$. That is overfit to $T$ with $D_{train}$ until the resulting approximation does not price duplicate data.
Conclusion
For trading additional trading data fairly and practically, we introduce ModeData Efficacy approaches, based on model approcimation of blackbox models, that prices the data without training it on the original model.
Approximating the effect of data on the model through model approximation (ModelData Efficacy) is a moderately practical solution to preserve model and data privacy. Model extraction, for example, can be used for fair pricing. That is, useless data can be priced minimally whil useful data can be priced high.
Approach  $D$ Leakage  $T$ leakage  Practicality  Fairness  Examples 

Giving up data  High  Low  High  Low  Default ML 
Giving up model  Low  High  High  Low  Academic Researchers 
Escrow smart contract  Medium  Medium  Low  High  Numerai, Enigma 
Encrypting the Model  High  Low  Low  N/A  Corti, PySyft 
Encrypting the Data  Medium  Low  Low  Medium  Microsoft SEAL 
Federated Learning  Low  Low  Low  High  Google (for Android Data) 
ModelData Efficacy  Low  Low  Medium  High  DeMoloch 
Against blackbox models, eencryptiong or approximating data have flaws regarding privacy. While federated learning with ddifferential privacy achieves privacy for both model owner and data owner, it is less practical for onetime transactions.
Future work
 Pretraining data synthesize from existing $D_{train}$ eliminates tuning on $D_{test}$ and refunes $usefulness$ into a metric for $novelty$
 Stronger transactional security against adversarial attacks against the model owner
References
 Aono, Yoshinori, et al. “Privacypreserving deep learning via additively homomorphic encryption.” IEEE Transactions on Information Forensics and Security 13.5 (2017): 13331345.
 Bastani, Osbert, Carolyn Kim, and Hamsa Bastani. “Interpretability via model extraction.” arXiv preprint arXiv:1706.09773 (2017).