Analysis of cyber relevant data has become an area of increasing focus. As
larger percentages of businesses and governments begin to understand the
implications of cyberattacks, the impetus for better cybersecurity solutions
has increased. Unfortunately, current cybersecurity datasets either offer no
ground truth or do so with anonymized data. The former leads to a quandary when
verifying results and the latter can remove valuable information. Additionally,
most existing datasets are large enough to make them unwieldy during prototype
development. In this paper we have developed the PicoDomain dataset, a compact
high-fidelity collection of Zeek logs from a realistic intrusion using relevant
Tools, Techniques, and Procedures. While simulated on a small-scale network,
this dataset consists of traffic typical of an enterprise network, which can be
utilized for rapid validation and iterative development of analytics platforms.
We have validated this dataset using traditional statistical analysis and
off-the-shelf Machine Learning techniques.