With the widespread of Artificial Intelligence (AI)- enabled security
applications, there is a need for collecting heterogeneous and scalable data
sources for effectively evaluating the performances of security applications.
This paper presents the description of new datasets, named ToN IoT datasets
that include distributed data sources collected from Telemetry datasets of
Internet of Things (IoT) services, Operating systems datasets of Windows and
Linux, and datasets of Network traffic. The paper aims to describe the new
testbed architecture used to collect Linux datasets from audit traces of hard
disk, memory and process. The architecture was designed in three distributed
layers of edge, fog, and cloud. The edge layer comprises IoT and network
systems, the fog layer includes virtual machines and gateways, and the cloud
layer includes data analytics and visualization tools connected with the other
two layers. The layers were programmatically controlled using Software-Defined
Network (SDN) and Network-Function Virtualization (NFV) using the VMware NSX
and vCloud NFV platform. The Linux ToN IoT datasets would be used to train and
validate various new federated and distributed AI-enabled security solutions
such as intrusion detection, threat intelligence, privacy preservation and
digital forensics. Various Data analytical and machine learning methods are
employed to determine the fidelity of the datasets in terms of examining
feature engineering, statistics of legitimate and security events, and
reliability of security events. The datasets can be publicly accessed from [1].