Recent advances in machine learning, increases in computational power, and reductions in data storage costs have revolutionized many computational techniques used by financial market players including Natural Language Processing (NLP) and Sentiment Analysis (SA), and we wish to keep members up to date on developments.
NLP is concerned with processing and analyzing human language data by computers and SA is a subfield of NLP that studies the mood, opinions, and attitudes expressed in written text (e.g., whether the author is positive or negative about a certain subject).
According to a 2019 market study by Research and Markets, the NLP market size is estimated to be USD 10.2 billion and is expected to reach USD 26.4 billion by 2024 with a Compound Annual Growth Rate (CAGR) of 21%. This makes NLP is the fastest growing technology in global AI for the financial asset management market. The major growth factors are: increase in smart device usage, growth of cloud-based solutions, and NLP-based applications to improve customer service.
Of the leading NLP solution providers, we note first that the Tech Giants (so-called GAFA quartet: Google, Apple, Facebook, and Amazon) are among the leading participants and second, only 2 non-US companies are represented: Baidu (China) and Linguamatics (UK) (see Table 1).
The combination of NLP and Machine Learning (ML) provides an avenue to generate insights with a level of precision significantly higher than ever before. For instance, in 2013 a team of researchers at Google created an ML-based model Word2Vec that maps words to vectors in such a way that semantically close words have close vector representations. This, in turn, allows “arithmetic on words” which, loosely speaking, can be thought of as, for example,
Brother – Sister ≈ Man – Woman.
This example follows from the pattern “Brother is to Sister as Man is to Woman”.
NLP/ML market is highly competitive and, for example, in December 2019 China’s Baidu announced that its ML-based software ERNIE achieved the highest score of 90.1 in the General Language Understanding Evaluation (moving Microsoft and Google to the 2nd and 3rd places, respectively). In January 2020 the ranking changed again with Google’s T5 Team achieving the highest score of 90.3.
 General Language Understanding Evaluation (GLUE) is a collection of datasets used to train, evaluate, and analyse NLP models relative to one another.