These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Credit cards play an exploding role in modern economies. Its popularity and
ubiquity have created a fertile ground for fraud, assisted by the cross boarder
reach and instantaneous confirmation. While transactions are growing, the fraud
percentages are also on the rise as well as the true cost of a dollar fraud.
Volume of transactions, uniqueness of frauds and ingenuity of the fraudster are
main challenges in detecting frauds. The advent of machine learning, artificial
intelligence and big data has opened up new tools in the fight against frauds.
Given past transactions, a machine learning algorithm has the ability to
'learn' infinitely complex characteristics in order to identify frauds in
real-time, surpassing the best human investigators. However, the developments
in fraud detection algorithms has been challenging and slow due the massively
unbalanced nature of fraud data, absence of benchmarks and standard evaluation
metrics to identify better performing classifiers, lack of sharing and
disclosure of research findings and the difficulties in getting access to
confidential transaction data for research. This work investigates the
properties of typical massively imbalanced fraud data sets, their availability,
suitability for research use while exploring the widely varying nature of fraud
distributions. Furthermore, we show how human annotation errors compound with
machine classification errors. We also carry out experiments to determine the
effect of PCA obfuscation (as a means of disseminating sensitive transaction
data for research and machine learning) on algorithmic performance of
classifiers and show that while PCA does not significantly degrade performance,
care should be taken to use the appropriate principle component size
(dimensions) to avoid overfitting.