Use Cases

How to Detect Credit Card Fraud Using Machine Learning

by
Craig Wisneski
,
January 19, 2022

Credit card fraud accounted for 393,207 of the nearly 1.4 million reports of identity theft in 2020. That’s a lot of people’s lives upended by fraudsters, and it's only getting worse. During the COVID-19 pandemic, credit card fraud surged around 35% globally. The global cost of credit card fraud is estimated to be nearly $30 billion USD per year.

Given the complexity of credit card fraud, it has become increasingly difficult to manage. This is where machine learning (ML) can help.

Credit card fraud has many different risk factors, each of which is associated with a particular attack vector. For example, card skimming involves using a hidden surveillance device to record data from the magnetic stripe on cards as they are swiped through readers. Other attack vectors include counterfeit cards, stolen credentials, and account takeover.

In this article, you'll learn how machine learning can be used to detect the various attack vectors of credit card fraud. We'll start with a high-level overview by looking at some common fraud scenarios, and then look at how machine learning can be used to identify them.

What is Machine Learning?

Machine learning is an application of AI in which computers can learn without being explicitly programmed. It's a complicated process that involves data inputs, algorithms, and model building.

For example, when you browse Netflix, the movie recommendations you see are a product of machine learning. Netflix's recommendation engine is built using ML, and it learns by analyzing billions of data points from its users.

In the context of credit card fraud, machine learning algorithms work similarly. They can be used to identify risk factors that signal a fraudulent transaction. The goal is to detect fraud in real-time, thus protecting the cardholder and the financial institution.

Various machine learning techniques can be used to detect fraudulent credit card transactions. These machine learning methods are under the umbrella of “anomaly detection,” since fraud transactions are an example of imbalanced data: There are a lot more legitimate transactions than fraudulent ones. In other words, fraudulent transactions are anomalous when compared to regular transactions, but they aren’t necessarily statistical outliers.

Models like Support Vector Machine (SVM) can be used even when there’s class imbalance, and can work with unsupervised data through extensions like OneClassCVM. Another technique is Principle Component Analysis (PCA), which sorts out a large mess of variables into simpler “principal components” that may be indicative of the problem at hand.

Suppose we have an ML algorithm that is trained on data from past credit card frauds. It's trained to recognize patterns in this data — for example, if a transaction looks like an account takeover, it will mark it as such. 

Once this algorithm has been run on a corpus of test cases (a set of examples), we can evaluate its accuracy by seeing how many transactions it correctly identifies as fraudulent. The higher the accuracy, the better.

Finally, the fraud detection system needs to be deployed on real-world online transactions, such that credit card companies can detect and prevent fraud cases in real-time.

What are the different types of credit card fraud?

Credit card fraud comes in many forms, and it's important to understand these attack vectors so we can build an effective ML model. We'll look at the most common ones below.

Identity theft to gain access to accounts

Identity theft is one of the most common forms of credit card fraud. It happens when someone uses your personal information (such as your name, address, and date of birth) to open accounts in your name.

In 2020, there were nearly 1.5 million reports of identity theft. Credit card fraud surged during the pandemic. These accounts may then be used to purchase items that are fraudulent or charge-backs. The goal is for the attacker to eventually drain the bank account until it's empty. In turn, banks are increasingly turning to credit card fraud detection using machine learning.

To detect identity theft, we can look at activity on an account over time. We can also use machine learning to identify patterns in transaction data that signal identity theft, such as large purchases in a short period of time or very low dollar amounts compared with previous transactions. 

This type of pattern indicates that something fishy is going on — perhaps the cardholder's account has been compromised and is being used maliciously. We can also perform risk-based monitoring to determine whether or not a particular cardholder is at high risk of identity theft.

Fake applications for credit cards

Instead of impersonating a real person, an attacker may attempt to use a synthetic identity to obtain a credit card. This can happen in several ways.

An attacker may create an identity on their own computer and submit it to the card issuer for approval. They might also use stolen personal information to create a false identity (i.e., using data from someone else's identity theft). The attacker may even try to buy identities on the black market, where they will purchase stolen data from people who have already been victimized by identity theft. 

In fact, there’s an entire digital black market for identity data. Beyond credit card fraud, this is used to hack emails, reroute mail, apply for loans and mortgages, and even access savings accounts.

Duplicating or theft of credit cards 

A lot of credit card fraud relates to the use of real, legitimate cards that are stolen or duplicated. These cards are then used to make fraudulent purchases, which leads to fraudulent charges on the cardholder's account.

This type of fraud is rampant and can occur in several ways.

Credit cards can be stolen through break-ins or by having information such as pin codes intercepted during online shopping. Similarly, cards can be lost or stolen from people through things such as mail delivery.

Cards can also be duplicated using a skimmer, which is a device that is installed in a credit card machine and captures data from a cardholder's account. This information is then used to make fraudulent purchases on their behalf. There are around 100,000 cases of counterfeit cards and duplicate cards each year.

Duplicate Transactions

A duplicate transaction happens when someone uses a legitimate transaction to create an additional synthetic transaction that can be hard to differentiate.

We can identify this attack by looking at the purchasing pattern of a fraudulent account. We can also see whether or not there are any suspicious purchase patterns that correlate with account takeover attacks.

Hacking accounts

Many people use the same passwords for multiple accounts, making it easy for hackers to compromise their other online accounts. If a person's account information is compromised, the hacker can use that information to open new fraudulent accounts in their name.

If a large number of people have been victims of account takeovers, it could be a sign that there is a wider problem with cybersecurity and not just identity theft.

Which are the top ML solutions in 2021?

Let’s understand credit card fraud detection using machine learning by exploring the top ML tools.

Akkio

Akkio is an end-to-end no-code AI platform. This means you can create and deploy AI models, all in one place, without any technical expertise. Usually, software engineers and other technical professionals need to integrate the models that are created. With Akkio, it’s easy to integrate AI into any workflow by building an “AI flow,” powered by a fully visual interface.

The platform can identify duplicate transactions, identity theft, card duplication, and other types of fraudulent activity with ease. All you need is the historical data, and you can build a model in clicks. Beyond credit card fraud detection using machine learning, Akkio can be used for tasks like churn prediction, attrition prediction, or even lead scoring.

Prevision

Prevision is a no-code AI solution that helps increase the productivity of your data science projects. In other words, you’re expected to already be on the AI journey, and have some technical capability. 

Prevision is also focused on AI modeling, and not the end-to-end process that would include AI integration into your business workflow. You’d typically follow four steps: Uploading data, training a model, analyzing performance, and creating predictions.

Gyana

Gyana is similar to Prevision, in that there’s a straightforward, visual process to analyze data, but there’s no end-to-end no-code system to integrate those models into your workflow. Gyana is a good fit for basic modeling needs.

However, for things like fraud detection, you'll need a more comprehensive modeling toolkit.

Why Akkio?

Akkio is a no-code AI platform for real-time decision making that’s easy to use, scalable, and affordable. It's easy to use for fraud prevention, but has similar capabilities for other use cases, such as lead scoring, churn, and attrition prediction.

Fraud prevention with Akkio is simple. You upload a historical dataset of transactions, including a column indicating whether the transaction is fraudulent. Using this information, Akkio can build a model that predicts whether or not a new credit card transaction will be fraudulent based on the pattern of transactions that came before it. 

Once you've built your model, you can use it for fraud prevention in real-time — never miss another fraudulent purchase!

Previously, machine learning required hiring expensive data scientists and engineers to build the model, train it for several weeks, and then use it to make decisions. This technical talent would need hands-on artificial intelligence expertise, as well as knowledge of languages like Python Pandas and algorithms from k-nearest neighbors (KNN) to logistic regression and random forest classifiers. With Akkio's visual interface, you can build a fraud prevention model in minutes, without needing to be a computer science expert.

Further, legacy AI platforms like Google Cloud require you to pay for training time, even if you don't end up with a usable model. With Akkio, you only pay when your model is used.

Most importantly, small businesses and nonprofits don't have the resources to hire data scientists and engineers to build ML models. So, they need a tool that is easy to use and affordable. Akkio can help you leverage AI without breaking the bank.

How does fraud detection work and how to implement it?

Akkio’s no-code AI can be used for credit card fraud detection using machine learning. Akkio deploys a number of complex data science and data mining techniques to learn from transaction data, such as random forest, decision trees, neural networks, and deep learning. 

Not just that, with Akkio - fraud detection can be done in real-time, with new data and metrics merged into the model continuously. The model can also be based on any amount of historical data to improve accuracy. 

Let’s look at a step-by-step example of using Akkio for credit card fraud detection. You’ll see this demo upon signing up for a free trial of Akkio. The demo uses a dataset of historical transactions, including a column named “Fraud?”, which is 0 for regular transactions and 1 for fraudulent transactions.

You can also upload your own dataset in just a click, and Akkio will automatically read in your training set, create what’s called a “train_test_split,” test out a series of machine learning models, calculate the metrics behind a confusion matrix, and more.

The Akkio AI Flow showing a dataset of credit card transactions.

This is a good example of an imbalanced dataset, where there are far more cases on one label (not fraudulent) than another (fraudulent). This is a tricky machine learning classification problem, since it’s easy to get high accuracy by simply guessing “not fraudulent” every time.

Akkio uses a number of methods in the background to counteract this form of overfitting, such as regularization. After selecting the “Fraud?” column, we get a model with high predictive accuracy for both cases of fraud and not fraud, as seen below.

The Akkio AI Flow showing the predictive accuracy of a fraud detection model.

We can also see the precision, recall, and F1 of the model, which are metrics that indicate how well the model did on this dataset. 

Precision is the fraction of correctly predicted positive observations to the total predicted positive observations. In this case, the question that this metric answer is of all transactions that are labeled as fraudulent, how many were actually fraudulent? High precision relates to a low false-positive rate. 

Recall is the fraction of correctly predicted positive observations to all observations in the class. In this case, we answer the question: Of all transactions that were truly fraudulent, how many did we label? 

Finally, F1 Score is the weighted average of precision and recall, so it takes both false positives and false negatives into account.

We can now conduct credit card fraud detection using machine learning, and deploy our fraud detection model in any setting directly via API or with the no-code tool Zapier.

Conclusion

Credit card fraud is an intractable problem that is only getting worse. This is because the entire financial system relies on people's trust in their cards.

There are many different forms of fraud, but machine learning can help us find patterns among the many types of credit card transactions to identify fraudulent behavior.

We hope this overview has given you a better sense of how machine learning can be used to detect fraud in your organization. If you're interested in trying out these tools for yourself, sign up for a free trial of Akkio’s scalable, easy-to-use platform, or check out our applications page for simple tutorials.

SIGN up

Grow Faster with No-Code ML

Now everyone can leverage the power of AI to grow their business.