Machine Learning: Unsupervised Learning in Finance

  • Global Risk Institute
A hand pointing at a floating holographic interface.


The last decade has witnessed a large-scale adoption of machine learning tools in finance. According to the latest report by Refinitiv, the number of data science teams in financial services firms have risen by more than 260 per cent since 2018 (see Refinitiv 2020). This extraordinary growth stems largely from recent revolutionary applications of machine learning (e.g., Google Neural Machine Translation, AlphaGo) and reveals the potential of machine learning to transform almost all aspects of the financial services industry. Evidently, it has become critical for financial executives to be able to effectively communicate with data science professionals. For instance, in one of its reports, J.P. Morgan’s quantitative investing and derivatives strategy team wrote (J.P. Morgan 2017):

Regardless of the timeline and shape of the eventual investment landscape, we believe that analysts, portfolio managers, traders and CIOs will eventually have to become familiar with Big Data and Machine Learning approaches to investing.

In its “Financial Innovation” series, the Global Risk Institute provides non-technical reviews of Machine Learning (ML) tools, its financial applications, and associated risks that executives should be aware of when developing ML solutions in their organizations. In this paper we discuss Unsupervised Learning (UL), one of the four main categories of ML.* Financial applications that we consider include: understanding country risk for foreign investment, trading, model risk management, fraud detection, assessment of companies’ financial situations, financial regulation, identification of complex relationships in stock markets, and early warning models for financial crises.

UL is used to draw inferences from data. The main goal is not to predict a certain variable, but rather to understand the structure of the data. Methods of UL can often be categorized as either clustering (splitting the data into groups also called clusters) or factor analyses (identifying the main factors that best describe the data).