TransURL: Improving malicious URL detection with multi-layer Transformer encoding and multi-scale pyramid features

TOP 文献データベース TransURL: Improving malicious URL detection with multi-layer Transformer encoding and multi-scale pyramid features

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2312.00508

PDF

https://arxiv.org/pdf/2312.00508

文献情報

作者: Ruitong Liu,Yanbin Wang,Zhenhao Guo,Haitao Xu,Zhan Qin,Wenrui Ma,Fan Zhang
公開日: 2023-12-1
更新日: 2025-3-21
所属機関: Department of Engineering, Shenzhen MSU-BIT University
所属の国: China
会議名: Comput. Networks

AIにより推定されたラベル

悪意のあるウェブサイト検出ウォーターマーキング URL解析手法

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Machine learning progress is advancing the detection of malicious URLs. However, advanced Transformers applied to URLs face difficulties in extracting local information, character-level details, and structural relationships. To address these challenges, we propose a novel approach for malicious URL detection, named TransURL. This method is implemented by co-training the character-aware Transformer with three feature modules: Multi-Layer Encoding, Multi-Scale Feature Learning, and Spatial Pyramid Attention. This specialized Transformer enables TransURL to extract embeddings with character-level information from URL token sequences, with the three modules aiding the fusion of multi-layer Transformer encodings and the capture of multi-scale local details and structural relationships. The proposed method is evaluated across several challenging scenarios, including class imbalance learning, multi-classification, cross-dataset testing, and adversarial sample attacks. Experimental results demonstrate a significant improvement compared to previous methods. For instance, it achieved a peak F1-score improvement of 40% in class-imbalanced scenarios and surpassed the best baseline by 14.13% in accuracy for adversarial attack scenarios. Additionally, a case study demonstrated that our method accurately identified all 30 active malicious web pages, whereas two previous state-of-the-art methods missed 4 and 7 malicious web pages, respectively. The codes and data are available at: https://github.com/Vul-det/TransURL/.

外部データセット

GramBeddings

Mendeley

Kaggle 1

Kaggle 2