Data Principles: The Power of Naming Standards


In my last blog, I highlighted five data principles that I have used in the past to create a solid data strategy.  The idea behind a principle is that it is flexible to adjust to changing needs but rigid enough to maintain order.  

A good example of this is the golden rule, "do unto others as you would have them do to you".  When you apply this principle, you are free to do whatever you want, as long as it is something you want done to yourself.  It is a very flexible principle that can have serious consequences if broken.  
One crucial principle that forms a solid foundation for your data initiatives is naming standards. It might seem like a simple concept, but consistent naming conventions throughout your data environment can significantly impact efficiency, governance, and understanding for everyone involved.  With the right naming standards, looking into the data environment as a DBA, developer, or user, there will be no question about what you are looking at. 
A Shared Language for Data Professionals
For data engineers, naming standards are a fundamental building block. Imagine working with a sprawling data environment where every object, from tables and columns to stored procedures and pipelines, has a unique and inconsistent name. Locating the right data or understanding the purpose of a particular table would become a frustrating exercise in guesswork.
By enforcing clear naming standards:
  • Logical grouping: Data engineers can easily group similar objects and processes together. For example, tables related to customer data could all start with "cust_" or pipelines processing sales information could be prefixed with "sales_".  Another naming standard could be based on the object type and use.  This could look like "edwh_sales_orders_tbl".  In this example, "edwh" is the use.  That prefix indicates that it is a main fact table in the edwh.  The "tbl" indicates the object type as a table.  This prefix is important if you implement a different principle that does not allow end users to access tables directly.  With this suffix, you will be able to make an object with the suffix "vw" for the view and still have the same core object name. Either of these standards creates a logical structure that's intuitive and easy to navigate.
  • Reduced ambiguity: The confusion about what to name database objects or coding structures is eliminated. With a predefined framework, data engineers can quickly and confidently name new data assets, saving time and effort.
  • Enhanced collaboration: A shared understanding of the data landscape enables seamless collaboration among data engineers and other data professionals. Everyone can quickly identify and interpret data assets, fostering efficient teamwork.
Strengthening Data Governance
Effective data governance relies on clear data lineage and accountability. Naming standards play a vital role in achieving this:
  • Improved data lineage: Consistent naming makes it easier to trace the origin and transformation of data across your environment. By following a logical naming structure, you can quickly identify the source systems, transformations, and destinations of specific data elements.
  • Easier data assignment and ownership: Clear names facilitate the assignment of data ownership and responsibilities. Knowing what data is contained within a specific table or pipeline makes it straightforward to determine who is accountable for its quality and management.
  • Third-party tools:  When you wish to automate data lineage using tools like Informatica, one part of their algorithm uses object names.  When the tool is unsure of the connection because there are no keys or other identifying metadata, it will base its suggestions on names.  With a consistent naming standard, these tools will be able to provide more accurate suggestions.
Empowering Database Administrators (DBAs)
DBAs are responsible for the performance and optimization of your data infrastructure. Naming standards directly benefit their work:
  • Streamlined optimization: When performing tasks like indexing or performance tuning, clear and consistent names help DBAs quickly identify the objects they need to work with. This speeds up their efforts and reduces the risk of errors.
  • Simplified management: Managing a large and complex database environment becomes more manageable when objects follow a predictable naming pattern. DBAs can easily understand the purpose and relationship of different objects, making their tasks more efficient.
Benefiting Data Consumers
Beyond the technical teams, naming standards also significantly impact data consumers like data scientists and business users:
  • Enhanced data understanding: With clear and intuitive naming, data scientists and business users can easily understand the meaning and content of data within the environment. This facilitates data exploration and analysis, enabling them to gain insights and make data-driven decisions.
  • Increased trust in data: Consistent naming conventions contribute to data quality and reliability. When data consumers trust the data they're working with, they can confidently leverage it for reporting, analysis, and other data-driven initiatives.
Did you realize how many people are impacted by this one principle?  Do you see how important it is for you to fully define your naming standards?  By creating a shared language for data, you empower your technical teams, strengthen data governance, and foster a data-driven culture throughout your organization. That is why this is a foundational step that sets the stage for a successful and sustainable data strategy.

Comments

Popular posts from this blog

Data Strategy: Guiding Principles

Data Governance: Building good bones