Vulnerability identification is crucial to protect the software systems from
attacks for cyber security. It is especially important to localize the
vulnerable functions among the source code to facilitate the fix. However, it
is a challenging and tedious process, and also requires specialized security
expertise. Inspired by the work on manually-defined patterns of vulnerabilities
from various code representation graphs and the recent advance on graph neural
networks, we propose Devign, a general graph neural network based model for
graph-level classification through learning on a rich set of code semantic
representations. It includes a novel Conv module to efficiently extract useful
features in the learned rich node representations for graph-level
classification. The model is trained over manually labeled datasets built on 4
diversified large-scale open-source C projects that incorporate high complexity
and variety of real source code instead of synthesis code used in previous
works. The results of the extensive evaluation on the datasets demonstrate that
Devign outperforms the state of the arts significantly with an average of
10.51% higher accuracy and 8.68\% F1 score, increases averagely 4.66% accuracy
and 6.37% F1 by the Conv module.