Machine Learning systems rely on data for training, input and ongoing
feedback and validation. Data in the field can come from varied sources, often
anonymous or unknown to the ultimate users of the data. Whenever data is
sourced and used, its consumers need assurance that the data accuracy is as
described, that the data has been obtained legitimately, and they need to
understand the terms under which the data is made available so that they can
honour them. Similarly, suppliers of data require assurances that their data is
being used legitimately by authorised parties, in accordance with their terms,
and that usage is appropriately recompensed. Furthermore, both parties may want
to agree on a specific set of quality of service (QoS) metrics, which can be
used to negotiate service quality based on cost, and then receive affirmation
that data is being supplied within those agreed QoS levels. Here we present a
conceptual architecture which enables data sharing agreements to be encoded and
computationally enforced, remuneration to be made when required, and a trusted
audit trail to be produced for later analysis or reproduction of the
environment. Our architecture uses blockchain-based distributed ledger
technology, which can facilitate transactions in situations where parties do
not have an established trust relationship or centralised command and control
structures. We explore techniques to promote faith in the accuracy of the
supplied data, and to let data users determine trade-offs between data quality
and cost. Our system is exemplified through consideration of a case study using
multiple data sources from different parties to monitor traffic levels in urban
locations.