These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The rapid development of large machine learning (ML) models requires a
massive amount of training data, resulting in booming demands of data sharing
and trading through data markets. Traditional centralized data markets suffer
from low level of security, and emerging decentralized platforms are faced with
efficiency and privacy challenges. In this paper, we propose OmniLytics+, the
first decentralized data market, built upon blockchain and smart contract
technologies, to simultaneously achieve 1) data (resp., model) privacy for the
data (resp. model) owner; 2) robustness against malicious data owners; 3)
efficient data validation and aggregation. Specifically, adopting the
zero-knowledge (ZK) rollup paradigm, OmniLytics+ proposes to secret share
encrypted local gradients, computed from the encrypted global model, with a set
of untrusted off-chain servers, who collaboratively generate a ZK proof on the
validity of the gradient. In this way, the storage and processing overheads are
securely offloaded from blockchain verifiers, significantly improving the
privacy, efficiency, and affordability over existing rollup solutions. We
implement the proposed OmniLytics+ data market as an Ethereum smart contract
[41]. Extensive experiments demonstrate the effectiveness of OmniLytics+ in
training large ML models in presence of malicious data owner, and the
substantial advantages of OmniLytics+ in gas cost and execution time over
baselines.