Building Your Data Strategy: The Pillars of Data Governance


 Data governance is the cornerstone of a healthy and effective data strategy. It's the framework that ensures your data is reliable, secure, and valuable across the entire organization. Think of it like a three-legged stool. Each leg – Data Quality, Data Sharing (with Data Catalogs & Glossaries), and Data Lineage – is crucial for stability. If one leg is shorter than the others, the stool becomes wobbly, just as neglecting one area of data governance weakens the entire framework.

  • Data Quality: This leg ensures data is accurate, consistent, complete, and reliable.
  • Data Sharing: This leg promotes discoverability and understanding through tools like Data Catalogs (inventories of data assets) and Data Glossaries (standardized business term definitions).
  • Data Lineage: This leg tracks the data's journey, providing transparency into its origins and transformations.

These three pillars work together to create a view of the data that the business can understand, measure, and trust.

Today's Focus: Data Quality - Not Just a Theory, but a Practice

While it's important to understand the theory of data quality, its true value lies in practical implementation within your data architecture, specifically within your data pipelines. This means embedding checks and controls at every stage to prevent data quality issues before they can impact downstream processes.

Embedding Data Quality in Your Pipelines: Practical Steps

Rather than addressing data quality reactively after problems arise, let's proactively build quality into the pipeline from the ground up.

Validate Foreign Keys: Foreign keys ensure referential integrity, making sure relationships between tables are consistent.

How

In your data pipelinesimplement validation steps that check if foreign key values in one table have corresponding primary key values in the related table.

The Data Principle

Enforcing Foreign Keys. When you are building out your data strategy and decide this is a principle you want to use, you will find many opportunities to use it.  Making sure data relationships are always valid will strengthen your data quality practice.

Enforce Nullability Properties: Defining whether a column can or cannot contain NULL values based on business rules is critical for data quality.

How

In your data pipelines, include checks to ensure that required columns (defined as NOT NULL) contain data.  Make sure to trap data that does not comply and set up a way to notify the correct people.  At times, these issues are caused by process errors at the source and can be easily corrected if the right person is notified. 

The Data Principle

Enforcing Nullability.  Here we are actively preventing missing or ambiguous data in critical fields.  If this data is allowed to move into the final tables, it could literally change million-dollar decisions.  The cause for this data quality issue could also be a process problem, as before, but we could have also introduced this problem with one of our transforms.  We should never think we don’t write bad code.  Instead, we should always check or work.

It all adds up

No one wants to sit on a stool that wobbles.  When you are always fighting to stay balanced, you don't have the time or energy to do whatever you originally sat down to do.  The same thing can be seen with our data governance.  We need to keep it stable so that the business can do what it really needs to do with the data, make decisions.  When deliberate thought is put into data quality, you will be one step closer to that goal.  

Now that we have covered data quality, tomorrow we will talk about practical ways to strengthen the next leg, Data Sharing.

As always, if you found this blog post helpful in understanding how data governance and practical data quality implementation can empower your organization, consider sharing it with your network!

Comments

Popular posts from this blog

Data Strategy: Guiding Principles

Data Principles: The Power of Naming Standards

Data Governance: Building good bones