These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
This study investigates the efficacy of machine learning models in network
security threat detection through the critical lens of partial versus complete
flow information, addressing a common gap between research settings and
real-time operational needs. We systematically evaluate how a standard
benchmark model, Random Forest, performs under varying training and testing
conditions (complete/complete, partial/partial, complete/partial), quantifying
the performance impact when dealing with the incomplete data typical in
real-time environments. Our findings demonstrate a significant performance
difference, with precision and recall dropping by up to 30% under certain
conditions when models trained on complete flows are tested against partial
flows. The study also reveals that, for the evaluated dataset and model, a
minimum threshold around 7 packets in the test set appears necessary for
maintaining reliable detection rates, providing valuable, quantified insights
for developing more realistic real-time detection strategies.