Data Governance: Check and Check Again


We are progressing in our data governance by building out the framework.  We are moving data into areas that the business can use it but how do we know they should use it?  More importantly how do they know that they CAN use it?

So much of what we do relies on trust.  The business trust that we will understand what they need so that we can deliver reliable data.  The first time they find an issue that trust is broken.  Part of our framework builds out data validation processes but how do we put that into practice?

One of the first things we will need to do is check ourselves. Your data governance framework can be used to create natural data quality checkpoints. Is your data pipeline ready to migrate? Did you check it? When your process captured the requirements did you ask who in the business would validate the work? Your processes are the first place to start checking your data.

Some specific examples of how you can use your data governance framework to create natural data quality checkpoints are:

  • Require that all data be validated before it is loaded into a data warehouse. This could be done using data validation rules or by running the data through a data quality tool.
  • Require that data be audited on a regular basis. Even long running and stable data feeds can fall victim to bad data so you have to make sure that it is correct and complete periodically. This could be done using a data audit tool or by manually reviewing the data.
  • Set up alerts to notify you when data quality issues are detected. This will help you to identify and address data quality issues quickly.
  • Create a data quality dashboard to track key data quality metrics. This will help you to identify trends in data quality and to identify any areas where improvement is needed. In the long run this is probably one of the best ways to automate known issue detection using a set of standard KPI's
Data quality and validation is important but most organizations do not have the resources to devote an entire team to the process. This is where automation really shines. There are a number of ways to automate data quality validations. One way is to use data validation rules. Data validation rules are sets of criteria that data must meet in order to be considered valid. These rules can be baked into ETL/ELT and fail the data before it even gets loaded. With a good set of processes to deal with the errors you could even "self heal" the data by alerting and presenting the business with ways to fix the data at the source.

Another way to automate data quality validations is to use a data quality tool. Data quality tools can be used to identify and correct a variety of data quality issues, such as missing values, duplicate records, and invalid data formats.

Here are some specific examples of how you can automate data quality validations:

  • Use data validation rules to validate data before it is loaded into a data warehouse.
  • Use a data quality tool to identify and correct data quality issues on a regular basis.
  • Set up automated alerts to notify you when data quality issues are detected.
  • Create a data quality dashboard to track key data quality metrics and to identify any areas where automation could be used to improve data quality.
Look for these traits when you are investigating a data quality tool. One other thing to look into would be how easy it would be for the business to implement rules. The closer you can get that step to the business the better off your data will be.

There are many different tools out there that you can use to automate this task. Just a few are:

  • Informatica: Informatica offers a range of data quality and data governance solutions, including Informatica PowerCenter, which can be used to automate a variety of data validation tasks.
  • Talend: Talend offers a cloud-based data integration and data quality platform that includes a variety of data validation features.
  • Trillium Software: Trillium Software offers a range of data quality and data governance solutions, including Trillium Quality Center, which can be used to automate a variety of data validation tasks.
  • IBM: IBM offers a range of data quality and data governance solutions, including IBM Information Governance Catalog, which can be used to automate a variety of data validation tasks.
  • SAP: SAP offers a range of data quality and data governance solutions, including SAP Information Steward, which can be used to automate a variety of data validation tasks.
Remember all of these tools need to tie back into the overall data strategy.  In some cases a data quality tool can pass rules directly to an ETL tool.  If you are in an ecosystem using a suite of tools check to see if it covers data quality.  You may already have this problem resolved.  

The reality is that even if you think your data is right, you need to check it again.  Your reputation is only as good as your data.  Remember to build out the processes in your framework, get the right tool to automate and then put in the work to check and recheck the output.  That hard work will pay off when the business stops asking if your data is right and starts making informed decisions with it.

Comments

Popular posts from this blog

Data Strategy: Guiding Principles

Data Principles: The Power of Naming Standards

Data Governance: Building good bones