Data Governance: Sharing the world around you

 


Have you ever watched a nature documentary where animals hunt as a pack?  They all work together to take down the prey, but what happens after there is no more need for teamwork?  Suddenly, the animals break into smaller groups and start fighting amongst themselves for the food.  Usually, the strongest gets its fill and then the rest take the scraps. 

I can already hear you asking yourself,  “What does the animal kingdom have to do with data?”.  In some organizations, what I just described plays out every day; the only difference is that the animals are wearing very expensive suits.  You see, the organization may have a façade of unity, but internally, they are all trying to drive their own agendas using their own version of the data. 

Most of this time, the intent is not malicious.  Each department really feels they are looking at the data the right way and are seeing the right patterns to give them justification to move the business in a given direction.  How is it possible that so many people are looking at “the data” and getting different answers?

The truth is, in most cases, they are not looking at “the data”; they are looking at a COPY of the data.  They are also depending on oral tradition for their understanding of the data.  With these things combined, there is no way for everyone to come to the same answer.  In most cases, they don’t even know the equation they are trying to solve. 

I also know the next question you are asking: “What does any of this have to do with data governance?”.  Well, when you combine the data principles that you have decided to use in your data strategy with the tools that data governance gives you, your organization will always understand what the data means, how it got to you, and most importantly, why decisions are made will be clear.

Let’s tackle the easiest problem first: getting rid of the copies of data.  Today’s technology offers quite a few good options on how to solve this problem. 

Sharing Large Datasets in Modern Data Architecture:

The modern data architecture today offers different ways to make sure your organization is using the same data, not just a copy.  Which one lines up with your data strategy will depend on a few things, including the maturity of your data ecosystem and the abilities of your team.  Do you have the staff who can move everything into one dedicated place?  Maybe a data Lakehouse is a better choice for you.  Is your data ecosystem full of legacy databases?  A data mesh or data fabric data platform may fit your organization better. 

Here is more detail on the available choices:

Data Lakehouse: This architecture combines the scalability of data lakes with the structure of data warehouses. This allows various tools and engines to access the data stored in open formats. For example, Spark can handle data science workloads, while data warehouses excel at traditional analytics and reporting.

Data Mesh: Organizations may have multiple data warehouses and lakes. This pattern promotes a decentralized approach to data ownership and sharing. Data owners can make their domain-specific data products accessible to other teams, ensuring distributed data is still discoverable and usable.

Data Fabric: This architecture creates a unified view of data across disparate sources. It makes data appear as if it resides in a single, central location. This simplifies access and allows users to query and analyze data without needing to know its underlying location or format.

Data Cloud: Cloud platforms provide scalable and flexible data storage and processing capabilities. This facilitates data sharing through various services and APIs.

Practical Examples:

Do you ever look at definitions and wonder what you are looking at?  How about some use cases to illustrate these data architectures:

Retail: A large retailer uses a data lakehouse to store customer data, sales transactions, and inventory information. The marketing team accesses this data using a data warehouse for customer segmentation and targeted marketing campaigns. The supply chain team utilizes Spark for optimizing inventory management and last-mile delivery.

Healthcare: A hospital system uses a data mesh to share patient records, lab results, and research data across departments. Doctors access patient records through an electronic health record (EHR) system, while researchers use specialized tools to analyze anonymized data for clinical trials and treatment development.

Beyond Access: Fostering Common Understanding with Data Governance Tools

Now we come to the second problem, one that you will need to give special attention to.  Once everyone is looking at the same data, if they don’t understand it, wrong assumptions are still going to be made.  What other data governance tools do you have in your toolbox?

Data Catalogs: These searchable inventories provide a single source of truth for all data assets, including metadata, descriptions, location, and lineage. They help users discover relevant datasets and understand their context.

Data Glossaries: These tools provide a standardized set of business term definitions. This ensures everyone speaks the same language when referring to data. They eliminate ambiguity and promote consistent interpretation of data across the organization.

Achieving Data Unity

These tools help everyone to know what is in the data, how it was derived, and how the organization is interpreting the data.  How does this help you?  There are quite a few benefits to having these tools.  A few are:

Break down data silos: This fosters collaboration and knowledge sharing across departments.

Improve data literacy: This empowers all employees, regardless of technical expertise, to understand and use data effectively.

Enable self-service BI: This allows business users to access and analyze data independently, driving faster decision-making.

Promote data-driven decision-making: This ensures that all decisions are based on accurate, consistent, and well-understood data.

The truth is, your organization does not have to be like a pack of animals always fighting to make the right choice.  With a few well-placed data governance tools and a dedication to a few data principles, there will be plenty of information to go around!

As always, if you liked this post, please share it.  Your support is greatly appreciated.

Comments

Popular posts from this blog

Data Strategy: Guiding Principles

Data Principles: The Power of Naming Standards

Data Principles: Consequences of Foreign Keys