These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
With the rapid expansion of web-based applications and cloud services,
malicious JavaScript code continues to pose significant threats to user
privacy, system integrity, and enterprise security. But, detecting such threats
remains challenging due to sophisticated code obfuscation techniques and
JavaScript's inherent language characteristics, particularly its nested closure
structures and syntactic flexibility. In this work, we propose DeCoda, a hybrid
defense framework that combines large language model (LLM)-based deobfuscation
with code graph learning: (1) We first construct a sophisticated
prompt-learning pipeline with multi-stage refinement, where the LLM
progressively reconstructs the original code structure from obfuscated inputs
and then generates normalized Abstract Syntax Tree (AST) representations; (2)
In JavaScript ASTs, dynamic typing scatters semantically similar nodes while
deeply nested functions fracture scope capturing, introducing structural noise
and semantic ambiguity. To address these challenges, we then propose to learn
hierarchical code graph representations via a Cluster-wise Graph that
synergistically integrates graph transformer network, node clustering, and
node-to-cluster attention to simultaneously capture both local node-level
semantics and global cluster-induced structural relationships from AST graph.
Experimental results demonstrate that our method achieves F1-scores of 94.64%
and 97.71% on two benchmark datasets, demonstrating absolute improvements of
10.74% and 13.85% over state-of-the-art baselines. In false-positive control
evaluation at fixed FPR levels (0.0001, 0.001, 0.01), our approach delivers
4.82, 5.91, and 2.53 higher TPR respectively compared to the best-performing
baseline. These results highlight the effectiveness of LLM-based deobfuscation
and underscore the importance of modeling cluster-level relationships in
detecting malicious code.