Machine learning is promising, but it often needs to process vast amounts of
sensitive data which raises concerns about privacy. In this white-paper, we
introduce Substra, a distributed framework for privacy-preserving, traceable
and collaborative Machine Learning. Substra gathers data providers and
algorithm designers into a network of nodes that can train models on demand but
under advanced permission regimes. To guarantee data privacy, Substra
implements distributed learning: the data never leave their nodes; only
algorithms, predictive models and non-sensitive metadata are exchanged on the
network. The computations are orchestrated by a Distributed Ledger Technology
which guarantees traceability and authenticity of information without needing
to trust a third party. Although originally developed for Healthcare
applications, Substra is not data, algorithm or programming language specific.
It supports many types of computation plans including parallel computation plan
commonly used in Federated Learning. With appropriate guidelines, it can be
deployed for numerous Machine Learning use-cases with data or algorithm
providers where trust is limited.