Stream deinterleaving is an important problem with various applications in
the cybersecurity domain. In this paper, we consider the specific problem of
deinterleaving DNS data streams using machine-learning techniques, with the
objective of automating the extraction of malware domain sequences. We first
develop a generative model for user request generation and DNS stream
interleaving. Based on these we evaluate various inference strategies for
deinterleaving including augmented HMMs and LSTMs on synthetic datasets. Our
results demonstrate that state-of-the-art LSTMs outperform more traditional
augmented HMMs in this application domain.