These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Background: Machine learning techniques have been widely used and demonstrate
promising performance in many software security tasks such as software
vulnerability prediction. However, the class ratio within software
vulnerability datasets is often highly imbalanced (since the percentage of
observed vulnerability is usually very low). Goal: To help security
practitioners address software security data class imbalanced issues and
further help build better prediction models with resampled datasets. Method: We
introduce an approach called Dazzle which is an optimized version of
conditional Wasserstein Generative Adversarial Networks with gradient penalty
(cWGAN-GP). Dazzle explores the architecture hyperparameters of cWGAN-GP with a
novel optimizer called Bayesian Optimization. We use Dazzle to generate
minority class samples to resample the original imbalanced training dataset.
Results: We evaluate Dazzle with three software security datasets, i.e., Moodle
vulnerable files, Ambari bug reports, and JavaScript function code. We show
that Dazzle is practical to use and demonstrates promising improvement over
existing state-of-the-art oversampling techniques such as SMOTE (e.g., with an
average of about 60% improvement rate over SMOTE in recall among all datasets).
Conclusion: Based on this study, we would suggest the use of optimized GANs as
an alternative method for security vulnerability data class imbalanced issues.