The ability to identify applications based on the network data they generate
could be a valuable tool for cyber defense. We report on a machine learning
technique capable of using netflow-like features to predict the application
that generated the traffic. In our experiments, we used ground-truth labels
obtained from host-based sensors deployed in a large enterprise environment; we
applied random forests and multilayer perceptrons to the tasks of browser vs.
non-browser identification, browser fingerprinting, and process name
prediction. For each of these tasks, we demonstrate how machine learning models
can achieve high classification accuracy using only netflow-like features as
the basis for classification.