Metadata-Based Detection of Child Sexual Abuse Material

TOP 文献データベース Metadata-Based Detection of Child Sexual Abuse Material

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2010.02387

PDF

https://arxiv.org/pdf/2010.02387

文献情報

作者: Mayana Pereira;Rahul Dodhia;Hyrum Anderson;Richard Brown
公開日: 2020-10-6
更新日: 2021-10-28
所属機関: Microsoft Corporation, AI for Good Research Lab, Redmond, WA, 98052
所属の国: United States of America
会議名: IEEE Trans. Dependable Secur. Comput.

AIにより推定されたラベル

機械学習モデル評価データセット評価

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Child Sexual Abuse Media (CSAM) is any visual record of a sexually-explicit activity involving minors. CSAM impacts victims differently from the actual abuse because the distribution never ends, and images are permanent. Machine learning-based solutions can help law enforcement quickly identify CSAM and block digital distribution. However, collecting CSAM imagery to train machine learning models has many ethical and legal constraints, creating a barrier to research development. With such restrictions in place, the development of CSAM machine learning detection systems based on file metadata uncovers several opportunities. Metadata is not a record of a crime, and it does not have legal restrictions. Therefore, investing in detection systems based on metadata can increase the rate of discovery of CSAM and help thousands of victims. We propose a framework for training and evaluating deployment-ready machine learning models for CSAM identification. Our framework provides guidelines to evaluate CSAM detection models against intelligent adversaries and models' performance with open data. We apply the proposed framework to the problem of CSAM detection based on file paths. In our experiments, the best-performing model is based on convolutional neural networks and achieves an accuracy of 0.97. Our evaluation shows that the CNN model is robust against offenders actively trying to evade detection by evaluating the model against adversarially modified data. Experiments with open datasets confirm that the model generalizes well and is deployment-ready.