The rapid development of Machine Learning (ML) has demonstrated superior
performance in many areas, such as computer vision, video and speech
recognition. It has now been increasingly leveraged in software systems to
automate the core tasks. However, how to securely develop the machine
learning-based modern software systems (MLBSS) remains a big challenge, for
which the insufficient consideration will largely limit its application in
safety-critical domains. One concern is that the present MLBSS development
tends to be rush, and the latent vulnerabilities and privacy issues exposed to
external users and attackers will be largely neglected and hard to be
identified. Additionally, machine learning-based software systems exhibit
different liabilities towards novel vulnerabilities at different development
stages from requirement analysis to system maintenance, due to its inherent
limitations from the model and data and the external adversary capabilities.
The successful generation of such intelligent systems will thus solicit
dedicated efforts jointly from different research areas, i.e., software
engineering, system security and machine learning. Most of the recent works
regarding the security issues for ML have a strong focus on the data and
models, which has brought adversarial attacks into consideration. In this work,
we consider that security for machine learning-based software systems may arise
from inherent system defects or external adversarial attacks, and the secure
development practices should be taken throughout the whole lifecycle. While
machine learning has become a new threat domain for existing software
engineering practices, there is no such review work covering the topic.
Overall, we present a holistic review regarding the security for MLBSS, which
covers a systematic understanding from a structure review of three distinct
aspects in terms of security threats...