How do you generally detect a fraud using analytics?


There are two broad range of algorithms that can help you detect fraud: 1) classification (supervised) 2) clustering (unsupervised)

Fraud Analytics Anomaly Data Science

Now it’s a fair assumption that fraud is pretty rare and it’s an outlier in your data. In other words, it’s a anomaly and the process of identifying them is called Anomaly Detection.

So under classification, there are algorithms out there specialize in “anomaly” detection like one-class SVM and PCA based anomaly detection. Try them out on your dataset and see if it’s able to capture “anomalies” in your dataset. While you are at it, don’t discount traditional classification algorithms either, they may be useful as well. You will have to train these algorithms and that’s why they are called “supervised”.

There an alternate approach. Which is to use unsupervised algorithms called “clustering” techniques. You could try something as simple as K-means or something more sophisticated. I haven’t used clustering much for solving fraud problems and have usually deferred to anomaly detection algorithms for this. But I am throwing this out there for making sure you know all the options! I can see these algorithms being applied to exploratory analysis where you are just exploring your data to find outliers to study them.

Hope that helps!


What do you think? Leave a comment below.