Data Principles: Using Nothing to get the job done

 


At first glance, the value of nothing is not significant.  In the world around us, we literally walk past “nothing” all the time and don’t even notice it.    But at times, “nothing” is just as important as something.  For example, if you were walking on a city street and someone removed a manhole cover.  That “nothing” would certainly have a huge impact if you fell into the manhole.

Another way “nothing” can impact you is that it allows others to make assumptions.   Have you ever been in a conversation with a friend when they asked you a question you did not want to answer?  Instead of telling them, “I would prefer not to comment,” did you just leave your comment unsaid, just hanging there in the air?  You were hoping they would infer that you did not want to comment, but is that what happened? 

You see, when you did not specifically answer the question, you allowed your friend to answer for you, and that is where drama always starts.

So what does “nothing” have to do with your data strategy?  The main goal of any data strategy is to give the business quality data that they can take action with.  Not having a defined way to deal with “nothing” leaves too many unanswered questions in your data, and that will cause your overall data strategy to fail.  How do we avoid failure?

Understanding NULLs

Let’s start by clearly defining what “Nothing” means in the world of data.  This concept is closely aligned with a column attribute known as nullability.  The attribute is binary, meaning it can be NULL or it cannot be NULL.  And what does NULL mean?  NULL values represent "no value," "unknown," or "not applicable". A NULL is not an empty string or a default value, which can misrepresent data.  A NULL is simply “nothing”.

How NULL is defined by Business Rules

Whether a column allows NULL values should be based on business rules. Consider a table storing user information with a "Preferred Name" column.

The business rule states that users can provide a preferred name, but it is not required.  To enable this in the data model, the "Preferred Name" column is set to allow NULL values.  Because of this setting, a NULL value in this column indicates the user chose not to provide a preferred name.

What would happen if this nullability principle is not enforced properly?  If the column was not allowed to be null but was filled with a default value like a blank string or "N/A", a data quality problem arises. Instead of accurately reflecting that your employee does not have a preferred name, suddenly their name is now N/A

The Impact of Nothing

Enforcing nullability correctly based on business rules makes data predictable.

  • For data engineers: They know what to expect and can write code that checks for NULLs, simplifying testing and reducing ambiguous cases.  Error handling will also be simplified, allowing the pipelines to deal with bad data with less complexity. 
  • For data analysts: They gain confidence in their analyses, preventing misinterpretations and ensuring insights are based on accurate representations of reality.
  • For business users: They will not have to interpret the data that they are looking at.  With a clear understanding, decisions will be made more accurately and faster.

The fact is, nothing matters.  Using the data principle to enforce nullability forces you to use business rules to see where nothing matters.  When your data platform understands where nothing matters, then your business users will as well.  That is how you can use nothing to make sure your data strategy hits its mark.

Comments

Popular posts from this blog

Data Strategy: Guiding Principles

Data Principles: The Power of Naming Standards

Data Principles: Consequences of Foreign Keys