Using the Hadoop Ecosystem to meet Basel 239 Requirements

  • Lowell Bryan
  • Abhishek Mehta
A man drawing a red circle around a cluster of icons representing people on a transparent surface.


“One of the most significant lessons learned from the global financial crisis that began in 2007 was that banks’ information technology (IT) and data architectures were inadequate to support the broad management of financial risks. Many banks lacked the ability to aggregate risk exposures and identify concentrations quickly and accurately at the bank group level, across business lines and between legal entities. Some banks were unable to manage their risks properly, because of weak risk data aggregation capabilities and risk reporting practices. This had severe consequences to the banks themselves and to the stability of the financial system as a whole.”

Thus begins the white paper called, “Principles for Effective Risk Aggregation and Risk Reporting,” issued by the Basel Committee on Banking Supervision in January, 2013. Based in Basel, Switzerland, this committee was commissioned by the Bank for International Settlements, which is the international organization of central banks, and the closest thing the world has to a global bank regulator.

In the paper, the Committee laid out 13 principles describing how risk data aggregation and management should be undertaken by banks and supervised by national regulators. The principles are all common-sensical and the logic behind them is compelling. Since it was issued, these principles have become known as the “Basel 239 requirements.”

The problem, however, as bank executives and their boards immediately realized upon reading the white paper, was the major disconnect between the sweeping aspirations represented by the principles and the operating reality of banks’ current system for data aggregation and data management at the time.

The deadline for the very large Globally Systemically Important Banks (G-SIBs) to meet these requirements is almost here (January 1, 2016). Despite the three years of lead time, almost all observers informed on the state of the industry’s data management capabilities believe that most G-SIBs will not be in compliance by this time. Indeed, as of January 1, 2015, nearly 50% of G-SIBs self-reported that they would be in “material non-compliance” by the target date. Most observers felt that these self-assessments are overly misleading and actually understate the numbers of G-SIBs which will be in material non-compliance as of
January 1, 2016. The majority also believe an even greater percent of large national institutions below G-SIBs size will be in material non-compliance, as their national regulators set their respective deadlines.

Difficulties in achieving compliance are not for lack of trying by the banking industry. For example, almost all of the large G-SIBs have undertaken massive, “brute force” efforts to get their data aggregation and data management capabilities in shape. Many have assigned top executives to oversee the efforts, who have, in turn, hired armies of outside professionals and spent massively on technology to address their compliance issues.

Much progress has been made, although given how far behind most banks were initially, it is unclear as of now how many of these “brute force” efforts will be judged as in “material compliance” come January 2016, or beyond for that matter.

The feedback from the Federal Reserve’s annual “stress tests” has not been encouraging. The Federal Reserve calls these tests “Comprehensive Capital Analysis and Review” (CCAR). In its follow-up discussions with the banks the Federal Reserve has consistently criticized the:

  • Data quality (i.e. too high error rates)
  • Ability to provide clear data lineage to the original source systems (i.e. too much aggregation of data through semi-manual spreadsheets) and
  • Lack of sufficient historical data


The problem is that in trying to meet the Basel 239 standards large banks are running into fundamental limits in the “Big Iron” technologies underlying their data architectures. Risk data in a G-SIB is sourced today from literally thousands of reporting systems and databases of various sizes and complexity. Trying to aggregate and manage all of this
data through semi-manual approaches is a nightmare. The primary “Big Iron” alternative, however, is to create a single enterprise warehouse devoted to managing all risk data. This exposes underlying limits of how much data volume these warehouses can handle and other related issues, such as how much data history can they maintain. Additionally, “Big Iron” technologies are unbelievably expensive. As a result, most banks are attempting to comply with Basel 239 requirements with a patchwork of direct reporting systems, enterprise data warehouses, and semi-manual efforts to fill in the gaps.

The good news is that a superior technology, collectivelyreferred to as the “Hadoop ecosystem” became available about 5 years ago and has reached a state of maturity that allows it to be a viable option for banks to overcome the limitations of the “Big Iron” legacy systems. In fact, Hadoop is already being used by most large banks to store the vast volume of “raw” data being produced today, not only for Basel 239 purposes, but for all purposes.

We are deliberately using the phrase “Hadoop ecosystem” rather than Big Data to describe this technology. The phrase “Big Data” has been hyped to the point that it has lost its meaning. All the major vendors maintain that they deliver solutions to meet Big Data needs. They also maintain that they use Hadoop. In reality, they deliver technologies where most of the aggregation and management of risk data is in data warehouses, not in Hadoop (which they primarily use just for “raw” data storage). As a result, they run into the same limits typical of “Big Iron” technologies.



We believe that taking greater advantage of new technologies, like Hadoop, can help banks meet Basel standards in the near team at far more modest costs than trying (and probably failing) to meet those standards using more robust enterprise data warehouses and reporting systems. In the longer term the same Hadoop ecosystem can serve as the foundation of future data architecture for banks. It can meet the challenge of running a 21st century bank that is fit to succeed in the Digital Age.

The starting point to taking advantage of the Hadoop ecosystem to meet Basel 239 standards is by using it to create a total institution-wide, “ready” Risk Data Asset. Or, simply, what we call a “Risk Data Asset.” By “Risk Data Asset,” we mean a single source of clean, consistent data that is made “ready” within the Hadoop ecosystem toprovision all the data needed for all risk applications in a manner that is Basel 239 compliant.

In the remainder of this document, we will elaborate on these ideas by describing:

  1. Challenges banks are facing in meeting Basel 239 requirements
  2. Underlying limits of enterprise data warehouses in meeting these requirements
  3. Capabilities of a “ready” Risk Data Asset in meeting Basel 239 requirements
  4. Steps in building a Risk Data Asset