The machine learning algorithm is gaining prominence in traffic
identification research as it offers a way to overcome the shortcomings of
port-based and deep packet inspection, especially for P2P-based Skype.
However,recent studies have focused mainly on traffic identification based on a
full-packet dataset, which poses great challenges to identifying online network
traffic. This study aims to provide a new flow identification algorithm by
taking the sampled flow records as the object. The study constructs flow
records from a Skype set as the dataset, considers the inherent NETFLOW and
extended flow metrics as features, and uses a fast correlation-based filter
algorithm to select highly correlated features. The study also proposes a new
NFI method that adopts a Bayesian updating mechanism to improve the classifier
model. The experimental results show that the proposed scheme can achieve much
better identification performance than existing state-of-the-art traffic
identification methods, and a typical feature metric is analyzed in the
sampling environment. The NFI method improves identification accuracy and
reduces false positives and false negatives compared to other methods.