In the field of fraud detection, the availability of comprehensive and
privacy-compliant datasets is crucial for advancing machine learning research
and developing effective anti-fraud systems. Traditional datasets often focus
on transaction-level information, which, while useful, overlooks the broader
context of customer behavior patterns that are essential for detecting
sophisticated fraud schemes. The scarcity of such data, primarily due to
privacy concerns, significantly hampers the development and testing of
predictive models that can operate effectively at the customer level.
Addressing this gap, our study introduces a benchmark that contains structured
datasets specifically designed for customer-level fraud detection. The
benchmark not only adheres to strict privacy guidelines to ensure user
confidentiality but also provides a rich source of information by encapsulating
customer-centric features. We have developed the benchmark that allows for the
comprehensive evaluation of various machine learning models, facilitating a
deeper understanding of their strengths and weaknesses in predicting fraudulent
activities. Through this work, we seek to bridge the existing gap in data
availability, offering researchers and practitioners a valuable resource that
empowers the development of next-generation fraud detection techniques.