SafePickle: Robust and Generic ML Detection of Malicious Pickle-based ML Models

Authors: Hillel Ohayon, Daniel Gilkarov, Ran Dubin
Published: 2026-02-23

Source: https://arxiv.org/abs/2602.19818

PDF: https://arxiv.org/pdf/2602.19818

Labels Predicted by AI

Model Extraction Attack Malware Detection Malware Detection Method

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

Model repositories such as Hugging Face increasingly distribute machine learning artifacts serialized with Python’s pickle format, exposing users to remote code execution (RCE) risks during model loading. Recent defenses, such as PickleBall, rely on per-library policy synthesis that requires complex system setups and verified benign models, which limits scalability and generalization. In this work, we propose a lightweight, machine-learning-based scanner that detects malicious Pickle-based files without policy generation or code instrumentation. Our approach statically extracts structural and semantic features from Pickle bytecode and applies supervised and unsupervised models to classify files as benign or malicious. We construct and release a labeled dataset of 727 Pickle-based files from Hugging Face and evaluate our models on four datasets: our own, PickleBall (out-of-distribution), Hide-and-Seek (9 advanced evasive malicious models), and synthetic joblib files. Our method achieves 90.01