ML project to better identify fraudulent transactions for Experian

The Challenge

Experian hired Calligo to use their data to enhance their Experian Product Link scoring using Machine Learning to better identify fraudulent transactions.

Experian had built a new ecommerce product that operates at the point of sale to determine how likely a transaction is to be fraudulent. It uses the customer data, matches them to a credit card in the Experian database, and returns a score on each attribute based on how well the provided information matches the data in the Experian database.

The original model had a “score” built on summary analytics and assumption with manually created values and calculations. The current test clients found the information presented to them too hard to interpret and needed both more details and simpler explanations to support their decision to accept or refer transactions.

Experian needed a data scientist to improve their score and make it more interpretable to enable the sales team to better sell the product and give confidence to the companies that are using the model results to make decisions.

The Action

Calligo used a combination of Bayesian trees fit across multiple transactional profiles to assess the likelihood of fraud for each transaction. The scores were combined with an assessment of each customer’s risk & transaction profiles to identify likely cases of fraud.

Following our Data Science by Design methodology, we found each of the customers had different fraud rates and transaction profiles which increased the difficulty of building a single machine learning model that would generalize across the population. To address this, we analyzed the statistical distributions of each data set and built individual Bayesian tree models for each client independently. Eventually combining them into a single tree that considered the underlying fraud rate and number of observations at each node. The meta model penalized inconsistencies across the datasets and could be adjusted by changing the weight of each of the customer trees to fit the different risk profiles of new clients.

The model output was an increase or decrease in fraud for each transaction relative to a defined baseline fraud rate. Additionally, clients could see the impact of each feature on the prediction and the match rating of each attribute. Not only was the output easily interpretable, but it also provided clients with more information to aid with their criteria for accepting or referring transactions.

Additional analysis could be supplied to individual clients to understand how the model could best be applied to their unique transactions and provide insights beyond the data available in the model such as transaction size. The model was built to be able to be customized for individual clients and provide insights based on their unique risk and transaction profile.

The Impact

The pilot program was successful and initial client tests yielded positive results.

The new product is currently being tested with a small set of clients. Initial feedback has been positive with clients excited about the increased explainability of the model and the new support for their decision making.

Client specific analysis showed that 40% of all transactions could automatically be accepted and reduce the oversight for low-risk transactions. High-risk transactions were able to be automatically identified and flagged for manual review. Transactions flagged by the model had a fraud rate 6x higher than non-flagged transactions.

Initial analysis estimated each client could save over $1 million a year using this product.