In the recent years, Portable Document Format, commonly known as PDF, has
become a democratized standard for document exchange and dissemination. This
trend has been due to its characteristics such as its flexibility and
portability across platforms. The widespread use of PDF has installed a false
impression of inherent safety among benign users. However, the characteristics
of PDF motivated hackers to exploit various types of vulnerabilities, overcome
security safeguards, thereby making the PDF format one of the most efficient
malicious code attack vectors. Therefore, efficiently detecting malicious PDF
files is crucial for information security. Several analysis techniques has been
proposed in the literature, be it static or dynamic, to extract the main
features that allow the discrimination of malware files from benign ones. Since
classical analysis techniques may be limited in case of zero-days,
machine-learning based techniques have emerged recently as an automatic
PDF-malware detection method that is able to generalize from a set of training
samples. These techniques are themselves facing the challenge of evasion
attacks where a malicious PDF is transformed to look benign. In this work, we
give an overview on the PDF-malware detection problem. We give a perspective on
the new challenges and emerging solutions.