These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Typical event datasets such as those used in network intrusion detection
comprise hundreds of thousands, sometimes millions, of discrete packet events.
These datasets tend to be high dimensional, stateful, and time-series in
nature, holding complex local and temporal feature associations. Packet data
can be abstracted into lower dimensional summary data, such as packet flow
records, where some of the temporal complexities of packet data can be
mitigated, and smaller well-engineered feature subsets can be created. This
data can be invaluable as training data for machine learning and cyber threat
detection techniques. Data can be collected in real-time, or from historical
packet trace archives. In this paper we focus on how flow records and summary
metadata can be extracted from packet data with high accuracy and robustness.
We identify limitations in current methods, how they may impact datasets, and
how these flaws may impact learning models. Finally, we propose methods to
improve the state of the art and introduce proof of concept tools to support
this work.