(This Is Part Two of Our “Data Governance Anti-Patterns Series”, Part One Can Be Found Here).
Anti-patterns are behaviors that take bad problems and lead to even worse solutions. In the world of data governance, they’re everywhere.
In today’s blog post, I’ll be focused on one anti-pattern in particular: Role Based Access Control (RBAC), which has actually been around for a really long time. The idea is to map access to roles, which in theory simplifies the rules around who can access what. But over the long term, RBAC is the wrong approach to take – it’s an anti-pattern.
Read on to learn why.
Let’s start with a really simple scenario to set the RBAC stage: We have 3 data sources to share: A, B, and C. But we can’t just have those data sources accessible to everyone – we need to put access controls on those data sources.
The RBAC approach is to create a single role for users who can see A, B, and C. Let’s call this Role 1. This makes a lot of sense, because if someone needs access to A, B, and C, we simply give them Role 1 in our identity management system, and they can access that data.
But sometimes approaches that makes sense at first can eventually get out of hand.
For example, say we now have a fourth data source: D. Unfortunately, data source D has different criteria for access, so we need a new role for D: Role 2. Also, in parallel, let’s say that we’ve learned that the rules around data source C have changed, and users now need additional training before they can access to it. So now we have to create a Role 3 for that.
This could go on for years, and before we know it, we have more roles than data sources. On top of that, when new users want to get access to data, they have no idea which role maps to what data and have no idea what to request. We call this problem “Role Explosion.” In the world of data science, this problem can – and frequently does – bring data governance programs to a halt.
Before diving into the solution to this problem, let’s spend some time on why something that seemed to make sense in the beginning ended up making things worse in the end. To do that, we need to talk about the three W’s of data policies: who, why, and what.
- “Who” is how you describe your users. These are sometimes groups they belong to, sometimes attributes or authorizations they possess. For example, if you were applying for a mortgage, your “who” would be your debt-to-income ratio, your salary, your marital status, etc.
- “Why” is the reason, or policy behind why a user would get access to a particular data source or not. Going back to the mortgage example, it would be something like:
“has a debt-to-income ratio < 40% and made a down payment > 5% of the loan value and is employed”.
Notice the “why” is combining the “who” with logic on if the loan should be granted or not.
- “What” is the data you should see once you do have access to the data. These can be thought of as privacy controls, like hiding rows and/or masking columns. These are also policies on the data. I’m not going to spend too much time on “what” in this blog post, but it’s worth mentioning.
So, did you figure out what happens in RBAC?
RBAC conflates the “who” and the “why” together into a single role. Because of this, you need a unique role for every unique combination of “who” and “why” (and even more terribly, sometimes also “what”). In our initial example, the “why” for C changed, which is the reason we needed to create a new role. The more complex your policies, the faster you’ll get role explosion.
The typical reaction to role explosion is manual approvals. For example, if Anne wants access to data source C, she’ll need to know to ask for access to Role 3 (already complicated for her to do) and then Anne needs to wait for someone (or many people) to approve her access to role 3. Why have manual approvals appeared on the scene? It’s because the “why” is in the brains of the approvers rather than spelled out as explicit logic. It’s in their brain because the “why” got collapsed with the “who” into a meaningless role named “Role 3”. Role 3 literally means nothing except to those approvers.
The key to defeating this anti-pattern is to break out the three Ws.
First, it’s important to come up with an initial set of relevant attributes/groups for different user personas. This means, figure out the universe of needed “things” to describe users. Remember, we are describing users here, not giving them random roles. For example, we described the mortgage borrower with debt-to-income ratio, down payment amount, and employment history. These are the components of the “who.”
Next, you can take those “who” components and build your “why.” You can spell out the policy just like we did for the mortgage loan. We’ve taken the “why” out of the brains of the approvers and removed all subjectivity. This is powerful.
Lastly, you can also build policies on “what” users can see once they are granted access. This is equally powerful.
As a result, your legal and compliance team can spend their time building the “why” around each data source rather than performing manual approvals – a much better use of their time! Also, when you have to explain why users have access to what data source, it’s simple; if done correctly, you should have the “why” right there rather than presenting a list of people that approved access.
And all of this is exactly what Immuta allows you to do. We leverage your “who” to build plain English policies, which become the “why” and “what” on your data. And Immuta enforces those policies dynamically. Users can discover data, request access, be approved automatically (or not) based on the “why” and only see what they are allowed to see based on the
what.” Everyone wins.
If you’ve already succumbed to RBAC, that’s ok, it’s an easy escape. First, focus on describing your users, then focus on how to enforce the “why” and “what” on your data – we can help with that part.
For a quick demonstration of how this works in Immuta, visit: https://vimeo.com/294031769/107ee3ea36
Click here for a 1:1 demo to see how Immuta’s platform can help you rapidly personalize data access and dramatically improve the creation, deployment, and auditability of machine learning and AI.