Abstract
For computer software, our security models, policies, mechanisms, and means
of assurance were primarily conceived and developed before the end of the
1970's. However, since that time, software has changed radically: it is
thousands of times larger, comprises countless libraries, layers, and services,
and is used for more purposes, in far more complex ways. It is worthwhile to
revisit our core computer security concepts. For example, it is unclear whether
the Principle of Least Privilege can help dictate security policy, when
software is too complex for either its developers or its users to explain its
intended behavior.
This paper outlines a data-driven model for software security that takes an
empirical, data-driven approach to modern software, and determines its exact,
concrete behavior via comprehensive, online monitoring. Specifically, this
paper briefly describes methods for efficient, detailed software monitoring, as
well as methods for learning detailed software statistics while providing
differential privacy for its users, and, finally, how machine learning methods
can help discover users' expectations for intended software behavior, and
thereby help set security policy. Those methods can be adopted in practice,
even at very large scales, and demonstrate that data-driven software security
models can provide real-world benefits.