Transaction Classification Overview

Marc Deveaux

7 min readNov 21, 2023

Sources

Building Payment Classification Models From Rules and Crowdsourced Labels: A Case Study by Artem Mateush, Rajesh Sharma, Marlon Dumas, Veronika Plotnikova, Ivan Slobozhan, and Jaan Ubi
H. Bengtsson and J. Jansson. Using classification algorithms for smart suggestions in accounting systems. Master Thesis, Chalmers University of Technology Gothenburg, Sweden, 2015.
O. E. E. Folkestad and E. E. N. Vollset. Automatic classification of bank transactions. Master Thesis, Norwegian University of Science and
Technology, Trondheim, 2017.
W. Etaiwi, M. Biltawi, and G. Naymat. Evaluation of classification algorithms for banking customers behavior under apache spark data processing system. Procedia Computer Science, 113:559–564, 2017
L. B. Skeppe. Classify swedish bank transactions with early and late fusion techniques. Master Thesis, KTH, Sweden, 2014

Introduction

A basic ingredient to build a deep understanding of expenditure patterns is to be able to classify Consumer-to-Business (C2B) payments across product categories
C2B transactions are the most relevant when determining expenditure patterns
Payment classification is a difficult problem because of the large and evolving set of businesses and the fact that each business may offer multiple types of products, e.g. a business may sell both food and electronics

You have 3 main approaches

Rule based

Advantages: A set of rules is maintained (typically bootstrapped by domain experts) in order to map each payment record to a category.
Risks: Not scalable as it requires rules to be maintained for every business and type of transaction

Machine Learning on transaction labeled by the customers (crowdsourcing)

Advantages: Positive improvement feedback loop to improve over times
Risks: Inconsistencies and hard to bootstrap since it requires many customers to manually label their transactions for an extended period of time

Hybrid

Advantages: in real-world classification problems, it is necessary to combine rule-based classification with machine learning so as to maintain high precision (and improve recall) as the system evolves over time
Risks: In our case, asking customers to manually label their transaction is out of scope

Example found where a hybrid approach is employed: “A set of rules is used to bootstrap a financial planner that allowed customers to view their transactions classified with respect to 66 categories, and to add labels to unclassified transactions or to re-label transactions. The crowdsourced labels, together with the initial rule set, are then used to train a machine learning model” (Building Payment Classification Models From Rules and Crowdsourced Labels: A Case Study)

Hierarchy Category and Sub Categories

It seems common to use a hierarchy system to categorize transaction. The first hierarchy is often around 10–15 categories. The sub-hierarchy is then
strongly expended, from 500 sub categories in VISA MCC to 66 in the above hybrid example

Two examples of categories & sub categories

Rule based example

The hybrid model previously discussed present in the below table the various rule based used in the financial institution. We can extrapolate on what kind of rules they are using. Note that AP relates to wire transfer while CP stands for Card Payment.

Rule A could be a list of the most common IBAN accounts users tend to send money to; for example, accounts related to hospitals, internet or
electricity bills are used regularly by multiple users and could be common enough that you can encode it in the system.
Rule C: payment comment most likely refers to the company name as well as the free text boxes where the debitor can write some information
Rule R: same thing but with further data cleaning treatment. It is interesting that they split it in two. Maybe some data enrichment techniques are used to try to improve the previous category
Rule I: It is not uncommon for companies to have accounts in multiple banks. As part of KYC and AML program, the financial institution would
have information on the industry this company relates to. So my guess would be that those internal code are related to information on their own clients
Rule M: MCC is a 4 digits code used to get information on a merchant in a card payment transaction. List of those codes can be find with a simple google research.

Rule based example with crowdsourcing

This example comes from “Using classification algorithms for smart suggestions in accounting systems”. A company named SpeedLedger classify transactions for other organizations and the end user accept or reject the label. They use a naïve classifier:

They check the category used (i.e “Restaurant”) in the last transaction from the organization with the same beneficiary
If no previous beneficiary is found, they search for the last transaction with identical text
If nothing matches, no suggestion is made

The classifier is used at an user level which is different from the previous case. We can assume it is to avoid situations where an user has to reject the
proposed label for the same beneficiary multiple times.

MCC

It looks like starting transaction classification with MCC is the easiest way to start, as each card payment is assigned a ready to be used category (among 500). However there are some caveats to it:

The rule-based mapping from the MCC to the two-level categorical hierarchy used in the financial institution leads to inherent mapping problems, as two different points of view on the consumption are considered
MCC-based mappings introduce the problem of products heterogeneity, as a single card payment processing agreement only covers one MCC
code, whereas multiple types of products are sold thereby
Payment type bias due to customer behavior: customers tend to use wire transactions for savings and leisure & travel payment categories, while payment cards are most often used for the food

Improving results with external semantic resources

The authors of the “Automatic Classification of Bank Transactions” paper had some interesting approaches when trying to improve the result of
Sparebank1 payment classification system with external resources:

Brønnøysund Registry

What is it: Norwegian government agency managing information about Norwegian companies — features like company address, business holder,
industry code used to represent the category and subcategory
How they used it: Not scalable as it requires rules to be maintained for every business and type of transaction
Results: Good accuracy improvement

Google Places API

What is it: Service that accepts HTTP requests for location data
How they used it: The API “responds with a list of places matching the text string, each of which contains a number of features. One of these
features is ’types,’ which describe the place, ex: Supermarket”
Results: Accuracy improvement. Because you have to pay each time you use their API, the authors use Google Place only when the classifier is
not sufficiently confident (details p34)

Wikidata & DBpedia

What is it: DBpedia is a project aiming to extract structured content from the information created in the Wikipedia project
How they used it: Acquire the meaning of the word in the transaction texts. A company name usually is present in the transaction texts so an
approach would be to find the company name, and find information of what industry the company operates in.
Results: No accuracy improvement

Yandex API

What is it: Russian tech company providing various services
How they used it: Translate transactions text from Norwegian to English

WordNet

What is it: Module that Natural Language Toolkit provides where a word can be sent in and synonyms are returned
How they used it: Objective is to bring more semantic meaning into the transaction texts (café -> coffee place)

Interesting ML techniques

Bag of word

Bag of word technique is used to convert the transaction descriptions to a representation better suited for machine learning:

A standard model to represent text is the Bag-of-Word model (BoW), where the idea is that the meaning of a text comes from its words,
regardless of their position in the document, thus cutting out all the words from a text, reorganize them and then tape them together would in the BoW model be seen as the exact same text
The more distinct words and sentences a document set has, the larger the feature space gets. The features contain weights saying something
about that particular feature’s importance

Fusion

Text fusion is a natural language processing (NLP) task that involves combining information from multiple sources into a single coherent
text. For example, Sentence fusion is the task of combining several independent sentences into a single coherent text
Fusioning data can take place on several levels in data processing, Joshi et al. states five levels for fusioning of multimodal biometrics; sensor-,
feature-, matchscore-, rank- and decision level.
Early fusion: process of merging several input modalities into a single feature vector before putting it into a single machine learning model for
training
Late fusion: Fusion scheme that first reduces unimodal features to separately learned concept scores, then these scores are integrated to learn concepts. (Multimodal meaning you have several type of inputs, like sound, text, image — in the paper modalities is to be understood as features). so late fusion all the modalities are learned independently and are combined right before the model makes a decision (aggregating
predictions at the decision level)

Transaction Classification Overview

Sources

Introduction

You have 3 main approaches

Hierarchy Category and Sub Categories

Rule based example

Rule based example with crowdsourcing

MCC

Improving results with external semantic resources

Interesting ML techniques

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Marc Deveaux

No responses yet