Fuzzing consists of repeatedly testing an application with modified, or
fuzzed, inputs with the goal of finding security vulnerabilities in
input-parsing code. In this paper, we show how to automate the generation of an
input grammar suitable for input fuzzing using sample inputs and
neural-network-based statistical machine-learning techniques. We present a
detailed case study with a complex input format, namely PDF, and a large
complex security-critical parser for this format, namely, the PDF parser
embedded in Microsoft's new Edge browser. We discuss (and measure) the tension
between conflicting learning and fuzzing goals: learning wants to capture the
structure of well-formed inputs, while fuzzing wants to break that structure in
order to cover unexpected code paths and find bugs. We also present a new
algorithm for this learn&fuzz challenge which uses a learnt input probability
distribution to intelligently guide where to fuzz inputs.