These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Standardized datasets and benchmarks have spurred innovations in computer
vision, natural language processing, multi-modal and tabular settings. We note
that, as compared to other well researched fields, fraud detection has unique
challenges: high-class imbalance, diverse feature types, frequently changing
fraud patterns, and adversarial nature of the problem. Due to these, the
modeling approaches evaluated on datasets from other research fields may not
work well for the fraud detection. In this paper, we introduce Fraud Dataset
Benchmark (FDB), a compilation of publicly available datasets catered to fraud
detection FDB comprises variety of fraud related tasks, ranging from
identifying fraudulent card-not-present transactions, detecting bot attacks,
classifying malicious URLs, estimating risk of loan default to content
moderation. The Python based library for FDB provides a consistent API for data
loading with standardized training and testing splits. We demonstrate several
applications of FDB that are of broad interest for fraud detection, including
feature engineering, comparison of supervised learning algorithms, label noise
removal, class-imbalance treatment and semi-supervised learning. We hope that
FDB provides a common playground for researchers and practitioners in the fraud
detection domain to develop robust and customized machine learning techniques
targeting various fraud use cases.