Protecting sensitive data is an essential part of security in cloud
computing. However, only specific privileged individuals have access to view or
interact with this data; therefore, it is unscalable to depend on these
individuals also to maintain the software. A solution to this is to allow
non-privileged individuals access to maintain these systems but mask sensitive
information from egressing. To this end, we have created a machine-learning
model to predict and redact fields with sensitive data. This work concentrates
on Azure PowerShell, showing how it applies to other command-line interfaces
and APIs. Using the F5-score as a weighted metric, we demonstrate different
transformation techniques to map this problem from an unknown field to the
well-researched area of natural language processing.
外部データセット
manually labeled data set containing over 60,000 entries derived from 1,420 commands