We present a blockchain based system that allows data owners, cloud vendors,
and AI developers to collaboratively train machine learning models in a
trustless AI marketplace. Data is a highly valued digital asset and central to
deriving business insights. Our system enables data owners to retain ownership
and privacy of their data, while still allowing AI developers to leverage the
data for training. Similarly, AI developers can utilize compute resources from
cloud vendors without loosing ownership or privacy of their trained models. Our
system protocols are set up to incentivize all three entities - data owners,
cloud vendors, and AI developers to truthfully record their actions on the
distributed ledger, so that the blockchain system provides verifiable evidence
of wrongdoing and dispute resolution. Our system is implemented on the
Hyperledger Fabric and can provide a viable alternative to centralized AI
systems that do not guarantee data or model privacy. We present experimental
performance results that demonstrate the latency and throughput of its
transactions under different network configurations where peers on the
blockchain may be spread across different datacenters and geographies. Our
results indicate that the proposed solution scales well to large number of data
and model owners and can train up to 70 models per second on a 12-peer non
optimized blockchain network and roughly 30 models per second in a 24 peer
network.