A graph-based sampling and consensus (GraphSAC) approach is introduced to
effectively detect anomalous nodes in large-scale graphs. Existing approaches
rely on connectivity and attributes of all nodes to assign an anomaly score per
node. However, nodal attributes and network links might be compromised by
adversaries, rendering these holistic approaches vulnerable. Alleviating this
limitation, GraphSAC randomly draws subsets of nodes, and relies on graph-aware
criteria to judiciously filter out sets contaminated by anomalous nodes, before
employing a semi-supervised learning (SSL) module to estimate nominal label
distributions per node. These learned nominal distributions are minimally
affected by the anomalous nodes, and hence can be directly adopted for anomaly
detection. Rigorous analysis provides performance guarantees for GraphSAC, by
bounding the required number of draws. The per-draw complexity grows linearly
with the number of edges, which implies efficient SSL, while draws can be run
in parallel, thereby ensuring scalability to large graphs. GraphSAC is tested
under different anomaly generation models based on random walks, clustered
anomalies, as well as contemporary adversarial attacks for graph data.
Experiments with real-world graphs showcase the advantage of GraphSAC relative
to state-of-the-art alternatives.