Data Governance: What is behind the curtains?


Data governance is a subject that most people would rather let someone else cover.  It is hard to do, hard to maintain, and hard to communicate with the business.  The business just expects the data to be right, it does not want to know what goes on behind the curtains.  Today we pull back the curtains to expose some of the things you will need to consider when you make it to this stop on your data strategy roadmap.

First lest define what data governance is.  In a nutshell data governance is a set of business rules and processes that are supplemented with technology to achieve standardized data that can be checked and verified.  A key part of data governance is that the results can be repeated and can be validated by external sources if need be.  Data governance is a tool that can be used when you are in a heavily audited industry like banking to show the path that your data has taken and the reasons for the changes.  It will also naturally lead you down a road where you gain mastery over your reference data and control over how it is managed.  Data governance is a continually improving state of your data that gives you the confidence to tell your users to trust the data.

So what are five main focuses for data governance? 

  • Data catalog: A data catalog is a tool that helps you to inventory and discover your data.  This tool is also the heart of a concept of data lineage.  Data Lineage is very important especially if you have a large organization or an organization that sufferers from excessive technical debt.  Make sure when you are researching the right tool for you to understand what level of data lineage you need because not all tools are created equal.  Some players in the industry include Alation, Collibra, and Informatica.
  • Data quality: Data quality tools help you to identify and correct errors in your data.  A good data quality tool will make data profiling (looking for patterns like common names) simple and easy to find outliers.  Once these patterns have been identified this tool should also allow you to make business rules that can be applied to your ETL processes with ease.  A great data quality tool is simple enough for your business users to find and fix data quality issues without your team getting involved.  Example tools include Informatica, Talend, and Trillium Software.
  • Data security: Data security tools help you to protect your data from unauthorized access, use, disclosure, disruption, modification, or destruction. This part of your data quality program will also lean heavily on process and procedure.  Doing things like using active directory groups to manage data access and configuring row level access in the database will be key to managing data security even though you will not be getting a particular software to do that.  A few tools that focus on the physical data and loss prevention include IBM Security Guardium, McAfee Data Loss Prevention, and Symantec Data Loss Prevention.
  • Data compliance: Data compliance tools help you to comply with a variety of data protection and privacy laws and regulations. The world of data compliance changes rapidly.  Looking into a software to supplement your process and procedure is highly recommended.  Example tools include AvePoint Compliance Guardian, Dell Boomi Automate, and IBM Security Guardium.
  • Data governance platform: A data governance platform is a comprehensive tool that helps you to manage all aspects of data governance. This software will work as the conductor that will tie your software and your process and procedures into one organism.  If you have limited resources in your data governance journey this is where you need to spend them.  Example tools include Collibra Data Governance Center, IBM Information Governance Catalog, and SAP Information Steward.
We know what we need, now how do we pick the right partners?  My past experience tells me that we need to keep this process simple, clear, and well documented.  In the next 10 years these decisions will be questioned the most so we need to be able to back them up.  

  • Identify your needs.  No mater which area you start in you will need to get a focused list of requirements.  These will be a mix of what the business expects at the end of the project and the technical needs you may have.  Technical requirements would include what cloud platform you need to use and consideration into what legacy sources you will be dealing with.  Document these carefully and distribute them to your team and your business stake holders.  Do this early and often.
  • Check the industry for players and their market share.  Focus on the subcategory here.  Look for data catalog tools specifically.  Market share is important but it should not be the leading factor in your decision.  Having said that there is a reason why a player has only been able to captured 1% of the market so that should at least be a red flag.
  • Read the Reviews.  Go to trusted sources for this information.  Gartner and other organizations specialize in compiling this.  You may have to pay a bit for it but in the end you will be grateful.  
  • Build a requirement matrix and get hands on.  For me this step is the most fun.  You get to bring in the vendors and put them head to head with each other.  Sure they say that they can profile data but how intuitive is their interface?  When the data quality rule is exposed to the ETL process is it easily integrated into the data flow?  How easy is it to use the feature that allows the business to collaborate on the definition of a data element?  All of these requirements need to turn into a weighted value that should help isolate a top two or three list.  That is when the real negotiations begin.

And as usual communication is KEY to this process.  You need stake holders in the POCs.  They need to contribute their opinions and weighted values.  The more they are included in the process of picking out the tools the more they will be invested in their success.  

Do not be afraid to pull back the curtains to expose what goes on in a solid data governance framework.  When you break down the steps they will not be nearly as daunting and with the right communication and documentation you can make data governance a reality.   

Comments

Popular posts from this blog

Data Strategy: Guiding Principles

Data Principles: The Power of Naming Standards

Data Governance: Building good bones