Data Governance Anti-Patterns: Start From Scratch; Rinse, Repeat

This is part IV of our “Data Governance Anti-Patterns Series.” You can find part III here.

Anti-patterns are behaviors that take bad problems and lead to even worse solutions. In the world of data governance, they’re everywhere. This blog’s anti-pattern again serves as an example of how initial intuition does not lend itself to a scalable solution – we’ll be talking about how to define conditions for when controls on data should be enforced or not.

First let’s take a simple analogy to help set the stage.  We all have a front door on our house, typically each person that lives in the house has a key to that door.  When you decided who to let into your house, you didn’t do this:

I am not giving a key to Sally, Brian, Ben…(list everyone in the world)…

You instead lock everyone out and create exceptions.  The exceptions are the people that have the key to your house.  You also make temporary exceptions, for example, when a babysitter is coming over, they are given a key for a short period of time.  The foundational behavior is to lock everyone out otherwise.

Just like who you let have a key to your house, data policy conditions should almost always be exceptions – you must have foundational behavior defined, and then set exceptions to that behavior.  Certainly this is how you GRANT access to tables in your own organizations, e.g. nobody can see the data until they’ve been GRANTed access to the table. The key assumption here is foundational behavior – nobody can see the data until…However, this is too simplistic of a foundational behavior in real scenarios, your foundational behavior will be a complex set of table and column controls, and then you will layer new exceptions on top of that.  

The anti-pattern creeps in, when the foundational behavior is undefined or assumed to be some flat global assumption, like “you can’t see anything”.  For example, we’ve seen customers break down all their data access controls based on sets of data sharing agreements. On the surface this makes sense, you’d see something like this:

If working in data sharing agreement x

  • allow access to the employee table, but mask the name, phone number, and address columns
  • allow access to the client data table.

If working in data sharing agreement y

  • allow access to the employee table
  • allow access the client table, but mask the name column

If working in data sharing agreement z

  • allow access to the employee table, but mask the name, phone number, and address columns
  • allow access to the client table, but mask the name column

In this situation, the assumed foundational behavior is you can’t see anything until you’re working under a data sharing agreement.  This is problematic because you are going to build very redundant policies over and over again for every data sharing agreement. As you can see, data sharing agreement z repeats policies from both agreements x and y.  Not to mention, there are also foundational, day-to-day policies that exist on that data which will already grant access to tables/columns prior to that data sharing agreement (for example, the user may already have access to the employee table without the data sharing agreement).

The anti-pattern is that you are defining all policies from scratch for each data sharing agreement, in other words, you’re deciding what to give the user from scratch for every use case – BAD.  Instead, you should set foundational policies and create exceptions to that behavior for each use case – GOOD.  Just like the babysitter was the exception to your default behavior of locking your door.  You make exceptions under certain scenarios. To be clear, the foundational behavior should not be something like “mask everything” – it is going to be more complex than locking your door, it needs to be the real policies for your day-to-day access to your data, and then build special exceptions on top of that.

 

Those same data sharing agreements could instead be written like this:

Legend:

You’ve now written your policies in a way that makes sense for data access request workflows in your organization and made obvious what the foundational behavior is instead of it being assumed.  

Let’s walk through what I mean.  Let’s say Sally is exploring data and she thinks she really needs to use the address column for some new analysis.  Without the exception-based technique, the flow looks like this:

  • Sally says: “hi compliance leadership, I need access to the address table to do this new analysis”.
  • Compliance leadership now needs to decide what the user should see or not see,  from scratch, for that use case and bake it into the new project’s policy.  Mistakes can happen very easily because they must consider everything Sally needs to see.  She’d in fact have to request a lot more than just that column – which is the real problem.

But, if we have foundational behavior with exceptions…now this fits the workflow perfectly.

  • Sally now says: “hi compliance leadership, I need access to the address table to do this new analysis”.
  • Compliance simply changes the policy to look like:

You simply add the exception to the existing policy.  But here’s the important part – if Sally was in group HR, she would have never had to make this request, because she was already seeing everything she needs because of the foundational policy!  Note if default behavior is to “hide everything”, you never gain any scale, everything must always be requested from scratch and all the policies built from scratch.

Key takeaway here is that you must define specific foundational policies (not a global default) and then create exceptions to those policies for various scenarios.  This removes redundancy and mistakes and allows better visibility into what exceptions exist across your organization. Starting from scratch for every data use is the anti-pattern.

Immuta enables this behavior through our Projects feature as well as exception-based policy construction in our fine-grained data controls.  You can create purposes for which exceptions can be made, and those purposes are associated to projects the users are working in. This way Sally can change her project to project Sally and immediately start seeing the address column due to that policy exception she requested.  Compliance can also easily see what users have accessed the employee table under project Sally and build logic around why users can be granted access to project Sally.

In short, Immuta provides you the capability to set the day-to-day foundational policies easily and then extend that logic to special cases based on purposes like data sharing agreements.  You must think “foundation then exceptions”, never “start from scratch for every use case.”

Click here for a 1:1 demo to see how Immuta’s platform can help you rapidly personalize data access and dramatically improve the creation, deployment, and auditability of machine learning and AI.