Data Principles: Using Nothing to get the job done
At first glance, the value of nothing is not
significant. In the world around us, we literally
walk past “nothing” all the time and don’t even notice it. But
at times, “nothing” is just as important as something. For example, if you were walking on a city
street and someone removed a manhole cover.
That “nothing” would certainly have a huge impact if you fell into the
manhole.
Another way “nothing” can impact you is that it allows others
to make assumptions. Have you ever been in a conversation with a friend
when they asked you a question you did not want to answer? Instead of telling them, “I would prefer not
to comment,” did you just leave your comment unsaid, just hanging there in the
air? You were hoping they would infer that
you did not want to comment, but is that what happened?
You see, when you did not specifically answer the question,
you allowed your friend to answer for you, and that is where drama always
starts.
So what does “nothing” have to do with your data strategy? The main goal of any data strategy is to give
the business quality data that they can take action with. Not having a defined way to deal with “nothing”
leaves too many unanswered questions in your data, and that will cause your overall
data strategy to fail. How do we avoid
failure?
Understanding NULLs
Let’s start by clearly defining what “Nothing” means in the world
of data. This concept is closely aligned
with a column attribute known as nullability.
The attribute is binary, meaning it can be NULL or it cannot be
NULL. And what does NULL mean? NULL values represent "no value,"
"unknown," or "not applicable". A NULL is not an empty
string or a default value, which can misrepresent data. A NULL is simply “nothing”.
How NULL is defined by Business Rules
Whether a column allows NULL values should be based on
business rules. Consider a table storing user information with a
"Preferred Name" column.
The business rule states that users can provide a preferred
name, but it is not required. To enable
this in the data model, the "Preferred Name" column is set to allow
NULL values. Because of this setting, a NULL
value in this column indicates the user chose not to provide a preferred name.
What would happen if this nullability principle is not
enforced properly? If the column was not
allowed to be null but was filled with a default value like a blank string or
"N/A", a data quality problem arises. Instead of accurately
reflecting that your employee does not have a preferred name, suddenly their
name is now N/A
The Impact of Nothing
Enforcing nullability correctly based on business rules
makes data predictable.
- For
data engineers: They know what to expect and can write code that
checks for NULLs, simplifying testing and reducing ambiguous cases. Error handling will also be simplified, allowing
the pipelines to deal with bad data with less complexity.
- For
data analysts: They gain confidence in their analyses, preventing
misinterpretations and ensuring insights are based on accurate
representations of reality.
- For
business users: They will not have to interpret the data that they are
looking at. With a clear
understanding, decisions will be made more accurately and faster.
The fact is, nothing matters. Using the data principle to enforce nullability
forces you to use business rules to see where nothing matters. When your data platform understands where
nothing matters, then your business users will as well. That is how you can use nothing to make sure
your data strategy hits its mark.
Comments
Post a Comment