When do you centralize ETL
ETL is a complicated game to play. Sometimes you have to work around data quality issues, data structure issues, and even business processes that create data exceptions that could not be accounted for in the requirement gathering phase of your sprint.
With all of the unknowns that your data creates deciding on the right architecture for your ETL environment is important. The devil is in the details but at a high level you have just a few choices. You can centralize your ETL using a tool that creates a one stop shop for development and execution or you can decentralize and run you ETL where the data sits.
The first method uses tools like Informatica or Datastage to consolidate metadata and transformation logic into one repository. Tools like this can orchestrate moving data from different types of sources to targets of any type and provide a great place for an administration team to monitor the processes.
When you decentralize your ETL platform you spread scripts to the impacted systems. If you have systems that run the same ecosystem you may be able to have one coordinating server that executes the scripts but when you go outside of that ecosystem your method of coding and executing the ETL will have to change. As an example if you are in a MS SQL ecosystem and a Teradata ecosystem. These two systems do not communicate well with each other creating the need operate in silos based on the ecosystem. In this architecture there is usually more than one place to administer the processes making the full picture harder to gather.
So, how do you decide which type of ETL architecture is right for you? Here are some factors to consider:
- The size and complexity of your data: If you have a lot of data or if your data is complex, a centralized ETL tool may be a better option. This is because centralized ETL tools can manage more data and more complex processes than decentralized ETL.
- Your budget: Centralized ETL tools can be more expensive than decentralized ETL. This is because they require more licenses and more hardware.
- Your team's skills and experience: If your team has the skills and experience to manage ETL processes, then you may be able to get away with using stored procedures. However, if your team does not have the skills and experience, then a centralized ETL tool may be a better option.
- Your future plans: If you plan to grow your data or your ETL requirements in the future, then a centralized ETL tool may be a better option. This is because centralized ETL tools are more scalable than decentralized ETL.
- The level of automation you need: Centralized ETL tools can provide a high level of automation, while stored procedures may require more manual intervention.
- The level of control you need: Centralized ETL tools can give you more control over the ETL process, while stored procedures may give you less control.
- The flexibility you need: Centralized ETL tools may be less flexible than stored procedures, which can be customized to meet specific needs.
Ultimately there is no "one size fits all". At some point a decision will have to be made and the business will move that way. The good news is these decisions can change. That is not to say that it will be an easy change, but every IT system matures and changes. Some just change more than others. So do not be afraid to make the best decision you can at that moment in time. Evaluate your current system and move forward conquering your data!
Comments
Post a Comment