These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Machine learning progress is advancing the detection of malicious URLs.
However, advanced Transformers applied to URLs face difficulties in extracting
local information, character-level details, and structural relationships. To
address these challenges, we propose a novel approach for malicious URL
detection, named TransURL. This method is implemented by co-training the
character-aware Transformer with three feature modules: Multi-Layer Encoding,
Multi-Scale Feature Learning, and Spatial Pyramid Attention. This specialized
Transformer enables TransURL to extract embeddings with character-level
information from URL token sequences, with the three modules aiding the fusion
of multi-layer Transformer encodings and the capture of multi-scale local
details and structural relationships. The proposed method is evaluated across
several challenging scenarios, including class imbalance learning,
multi-classification, cross-dataset testing, and adversarial sample attacks.
Experimental results demonstrate a significant improvement compared to previous
methods. For instance, it achieved a peak F1-score improvement of 40% in
class-imbalanced scenarios and surpassed the best baseline by 14.13% in
accuracy for adversarial attack scenarios. Additionally, a case study
demonstrated that our method accurately identified all 30 active malicious web
pages, whereas two previous state-of-the-art methods missed 4 and 7 malicious
web pages, respectively. The codes and data are available at:
https://github.com/Vul-det/TransURL/.