This is a question that we get often from clients and potential clients. What’s the fuss with the sequencing of these three words? After all, there will be an Extract (E) from the source system; the extracted data will undergo a Transformation (T) process; and all this will undergo a Load (L) process into the target system. So why does it matter if the L goes before the T? The answer lies in the setup of the infrastructure to accomplish this process.
In a typical reporting system implementation, the source system is built before the reporting solution. As a result, the source system is optimized to perform the tasks it was meant to, and not much else. This is mainly because either reporting requirements were not considered at the time of system creation, or the requirements didn’t evolve enough to build out the reporting system. Examples of such source systems are Customer Relationship Management (CRM), or transactional systems (OLTP). So, in other words, when a new system is added to the environment, the source system cannot be touched, lest the entire environment ends up rebuilt from scratch. As a result, the reporting or analysis system is an add-on, with no load added to the source system. Tools like Oracle Golden Gate are popular in the extraction stage for this kind of traditional environment, since they essentially read the log files of the source system to glean the data that is to be extracted, thus putting no load on the source system whatsoever.
Until a few years ago, it was normal to stage the data into an intermediate system, before it was pushed into the target. This was simply because the target could not be taken offline to perform the load, or because the target system was better optimized to retrieve and report (and did not have the resources to perform hard crunching of numbers or data). As a result, the intermediate stage would be optimized to perform calculations and transformations on the data, which lead to the stage being called ‘Transform,’ since the data underwent a transformation. This approach also kept the target reporting system independent of the implementation method during the transform stage. As a result, many organizations implemented three separate systems to satisfy the requirements of each of the stages, usually requiring separate hardware each. This is the typical ETL – Extraction, Transformation, and Loading – system.
More recently, computing hardware systems have become more capable. And since database software (DBMS or database management systems) are now more powerful, often working in concert with purpose-built machines like Oracle’s Exadata. As a result, systems that perform reporting can now also perform data calculations. In some cases, the design of the transformational technology ties closely into the technology or platform used for reporting. For example, software like Financial Data Quality Management Enterprise Edition (FDMEE) ties into Enterprise Performance Management (EPM) applications closely, because they are often built on the same system, and are integral to the solution. In this case, the ‘Load’ is performed into the target system before the data undergoes ‘Transform’ on the same system for reporting. Among other things, this allows the reporting applications to drill-back into the source system, allowing a data point to be traced all the way back to the source transactions that created it.
So then is the difference between ETL and ELT purely semantics? Not necessarily so. It is more of a rethinking of the approach taken to transferring transaction data into reporting systems, to take advantage of changes in technology. And along the way, it has enhanced the reporting solution with added value like tracing of data points. Traditional ETL was only one way – transactional system to reporting. With ELT, it is possible to trace-back from the reporting to transactional systems as needed, while retaining the original requirement of the reporting system not adding a load onto the transactional system.
Author: Andy Tauro, Performance Architects