Classifying network traffic is the basis for important network applications.
Prior research in this area has faced challenges on the availability of
representative datasets, and many of the results cannot be readily reproduced.
Such a problem is exacerbated by emerging data-driven machine learning based
approaches. To address this issue, we provide three open datasets containing
almost 1.3M labeled flows in total, with flow features and anonymized raw
packets, for the research community. We focus on broad aspects in network
traffic analysis, including both malware detection and application
classification. We release the datasets in the form of an open challenge called
NetML and implement several machine learning methods including random-forest,
SVM and MLP. As we continue to grow NetML, we expect the datasets to serve as a
common platform for AI driven, reproducible research on network flow analytics.