Data Principles: Consequences of Foreign Keys

As a child, did you enjoy doing your chores, or did you think your parents were just being mean?  As a parent, even though you know your child is not happy with you when you enforce the chore list, why do you do it?  Certainly, in the current moment, enforcing chores is not easy.  Making your child comply takes effort and consistent oversight.   Who has time for that, right? But after 18 years of that effort, what do you end up with?  Isn't it true that these chores build a solid foundation for your child to stand on as they enter the adult world?  Didn't all of that constant oversight and hard work pay off as you proudly watch your very capable child move on to the next challenge, knowing they would be able to conquer it? 

When implementing a data strategy, you are faced with a similar question.  Where do you put your time, effort, and energy?  Do you go quickly, bypassing things you know will be problems in the future for the quick win, or do you lay down the proper foundation, knowing that it will pay off in the future?  That is where the data principle of enforcing foreign keys plays a vital role.  This principle forces you to put in the time at the very beginning to consider and solve for what you know is going to happen, bad data.  

Foreign Keys: A Monitored Connection for Data Quality

To start connecting the dots, first, we have to define what a foreign key is. Foreign keys establish a critical connection between tables, referencing the primary key of a parent table in a child table. This enforced relationship ensures that any data entered into the foreign key column of the child table must have a corresponding entry in the primary key of the parent table. This prevents the creation of "orphaned" records, which are data points in a table that don't have a valid connection to their associated parent data.

Forcing Data Quality Processes in Data Pipelines:

By enforcing foreign key constraints, data engineers are compelled to embed data quality processes directly into their data pipelines. This means:

Validation at the source: Data engineers must validate incoming data against the foreign key constraints before it's loaded into the child table. This could involve checking if a customer ID in an order table exists in the customer table before loading the order.

Error handling: If data doesn't conform to the foreign key constraint, the pipeline must have mechanisms to capture and handle these errors. This could involve logging the invalid data, moving it to an error table, or rejecting the entire batch.

This emphasis on data quality in the pipelines makes data engineers more accountable for the quality of the data they are processing and loading.

Architectural Accountability for Data Quality

The principle of enforcing foreign keys also places responsibility on the data architect for ensuring data quality:

Data error capturing processes: The architect must design and implement the architecture to capture data errors that violate foreign key constraints. This could involve establishing error tables, logging mechanisms, or alerting systems to notify relevant teams of data quality issues.

Data error reporting processes: The architect must design reporting mechanisms to provide visibility into data quality issues. This allows for monitoring and tracking of errors, enabling data governance processes and continuous improvement of data quality.

Empowering the Business and Enhancing the Data Team's Reputation

When foreign key constraints are enforced, the business is forced to address data quality issues. This could involve:

Identifying and resolving data errors at the source: The business needs to investigate why data is not conforming to the foreign key constraints and implement corrective actions in their data sources or processes.

Defining rules for handling non-conforming data: The business needs to decide how to handle data that doesn't comply, such as whether to reject it, quarantine it, or find a way to make it compliant.

This proactive approach to data quality, enabled by foreign keys, enhances the data team's reputation by demonstrating their commitment to delivering trustworthy data. By not allowing bad data into the tables, the data team ensures that reports, analyses, and data-driven decisions are based on accurate and reliable information.

The Interplay of Naming Standards and Foreign Keys

The data principle of enforcing foreign keys is closely connected to the principle of correct naming standards.

Clearer relationships: When foreign keys are named properly, often using a prefix like "fk_" followed by the referenced table and column names (fk_Order_tbl_Order_Detail_tbl), it makes the relationships between tables immediately clear.

Faster troubleshooting: If a foreign key constraint is violated, the clear naming convention makes it easy to identify the involved tables and columns, speeding up the troubleshooting process.

Just like your child, your data strategy and your data platform will perform better, be more stable, and require less support if it is built on a strong foundation.  Using the data principle of enforcing foreign keys will force you to spend time up front setting up the processes and procedures to deal with the data quality issues you know you are going to face.  It may be frustrating at the beginning, and it may even slow down your overall data strategy, but when you are able to watch your data platform become successful, you will be just as proud as you are watching your child graduate from High School.  In both situations, you will know that there is a solid foundation that they will be able to use to face and overcome any future challenges. 

Comments

Popular posts from this blog

Data Strategy: Guiding Principles

Data Principles: The Power of Naming Standards

Data Governance: Building good bones