Building Your Data Strategy: The Pillars of Data Governance
Data governance is the cornerstone of a healthy and effective data strategy. It's the framework that ensures your data is reliable, secure, and valuable across the entire organization. Think of it like a three-legged stool. Each leg – Data Quality, Data Sharing (with Data Catalogs & Glossaries), and Data Lineage – is crucial for stability. If one leg is shorter than the others, the stool becomes wobbly, just as neglecting one area of data governance weakens the entire framework.
- Data
Quality: This leg ensures data is accurate, consistent, complete, and
reliable.
- Data
Sharing: This leg promotes discoverability and understanding through
tools like Data Catalogs (inventories of data assets) and Data Glossaries
(standardized business term definitions).
- Data
Lineage: This leg tracks the data's journey, providing transparency
into its origins and transformations.
These three pillars work together to create a view of the
data that the business can understand, measure, and trust.
Today's Focus: Data Quality - Not Just a Theory, but a Practice
While it's important to understand the theory of data
quality, its true value lies in practical implementation within your data
architecture, specifically within your data pipelines. This means embedding
checks and controls at every stage to prevent data quality issues before they
can impact downstream processes.
Embedding Data Quality in Your Pipelines: Practical Steps
Rather than addressing data quality reactively after
problems arise, let's proactively build quality into the pipeline from the
ground up.
Validate Foreign Keys: Foreign keys ensure
referential integrity, making sure relationships between tables are consistent.
How
In your data pipelines, implement validation
steps that check if foreign key values in one table have corresponding primary
key values in the related table.
The Data Principle
Enforcing Foreign Keys. When you are building out your data
strategy and decide this is a principle you want to use, you will find many opportunities
to use it. Making sure data
relationships are always valid will strengthen your data quality practice.
Enforce Nullability Properties: Defining whether
a column can or cannot contain NULL values based on business rules is critical
for data quality.
How
In your data pipelines, include checks to ensure that
required columns (defined as NOT NULL) contain data. Make sure to trap data that does not comply
and set up a way to notify the correct people.
At times, these issues are caused by process errors at the source and
can be easily corrected if the right person is notified.
The Data Principle
Enforcing Nullability. Here we are actively preventing
missing or ambiguous data in critical fields.
If this data is allowed to move into the final tables, it could literally
change million-dollar decisions. The
cause for this data quality issue could also be a process problem, as before,
but we could have also introduced this problem with one of our transforms. We should never think we don’t write bad code. Instead, we should always check or work.
It all adds up
No one wants to sit on a stool that wobbles. When you are always fighting to stay balanced, you don't have the time or energy to do whatever you originally sat down to do. The same thing can be seen with our data governance. We need to keep it stable so that the business can do what it really needs to do with the data, make decisions. When deliberate thought is put into data quality, you will be one step closer to that goal.
Now that we have covered data quality, tomorrow we will talk about practical ways to strengthen the next leg, Data Sharing.
As always, if you
found this blog post helpful in understanding how data governance and
practical data quality implementation can empower your organization, consider
sharing it with your network!
Comments
Post a Comment