Like other modern advancements in data use, the data mesh concept is rooted in a central goal: give more users easier access to business-driving data. But like many data initiatives, implementation can be much more difficult than it seems.
A data mesh architecture is an amalgamation of moving parts, decentralizing data away from more traditional, centralized data architectures. For this kind of distributed ecosystem to work, dynamic controls are needed to govern data access and use at the domain and organizational levels. In this blog, we’ll break down the idea of federated data mesh governance, and share the three pillars needed for a secure and streamlined implementation.
Breaking Down Data Mesh
The concept of the data mesh is based on the decentralization of data ownership and the enablement of domain experts to independently develop and manage their own data products. A departure from traditional centralized data warehouses, data mesh architectures separate data into various self-contained domains with specific business contexts, use cases, or projects.
This model transfers data ownership to the teams most closely aligned with it, enabling self-service data access and use that doesn’t require tedious or time-consuming requests and approvals. The data mesh is centered around four pillars, each necessary in order to achieve a successful implementation:
- Domain-Centric Ownership & Architecture
- Data-as-a-Product
- Self-Service Data Platform
- Federated Computational Governance
While each of these pillars is integral to operationalizing a data mesh architecture, federated computational governance might be the most essential to this model’s success.
The Importance of Federated Data Mesh Governance
The National Institute of Standards and Technology (NIST) defines data governance as:
“A set of processes that ensures that data assets are formally managed throughout the enterprise. A data governance model establishes authority and management and decision making parameters related to the data produced or managed by the enterprise.”
In the words of Zhamak Dehghani, a technology consultant and originator of the data mesh, the data mesh framework was created “to decentralize the problem into smaller pieces and integrate them in a different way so that we can move faster, so that we can localize change to the domains, so that we can handle that continuous change and continuous growth.”
The ultimate goal of the data mesh, then, is to avoid the drawbacks of centralized data ecosystems–such as limited scalability, bottlenecked access, and inefficient time-to-data–by decentralizing and distributing resources and responsibilities.
To ensure that this distributed ecosystem operates in a way that alleviates burdens rather than causing them, organizations need to maintain a system of intentional oversight across their various domains. This includes monitoring data access and use, and applying policies consistently, without hindering the self-service nature of domains. To fully operationalize a data mesh architecture, processes and data governance best practices must be in place to manage domain- and ecosystem-level access.
3 Components of Federated Data Mesh Governance
In distributed data mesh environments, there are two levels that require controls: the global (horizontal) level and the local (vertical) level. Decentralizing data demands that teams balance the delegation of local, domain-based management with the enforcement of consistent global security standards across the ecosystem.
The three key components that support effective data mesh governance are:
1. Global Policies
Global policies should be written and applied based on the most generic, all-encompassing principles of the data ecosystem. These principles should be informed by things like compliance laws and regulations, legal obligations, industry standards, and other high-level rules. Global policies are applied horizontally across the entire data ecosystem, acting as a policy umbrella that stretches across all domains. Examples of global policies might include:
- “Mask all PII data”
- “Anonymize all PHI data”
- “Encrypt all financial data sets”
A healthcare and life sciences organization would be subject to the requirements of the Health Insurance Portability and Accountability Act (HIPAA). HIPAA regulates how protected health information (PHI) and other healthcare data can be collected, stored, and used. Even if an organization were to set up distributed domains for their various departments, HIPAA’s privacy requirements would apply across the board. To ensure compliance with these requirements, the organization would need to author and apply global policies that affect data use in each one of its domains.
2. Local Policies
Conversely, local policies should be more fine-grained and applicable for domain-specific purposes or use cases. While these may still be informed by higher-level regulations and requirements, they should be created within the context of each respective domain. Once written, they need only to apply vertically within the specific domain(s) that they were created for, not across the whole ecosystem. Examples of local policies might include:
- Redact all rows that contain data tagged ‘credit_card_number’
- Hash all columns marked ‘home_address’
- Null all values tagged ‘social_security_number’
In our same healthcare example, an organization might want to create specific domains to facilitate billing for different geographic regions. Each domain would be in charge of the storage and access of the customer data from its region. Within these domains, there could be data users responsible for specific subregions, be they states or countries, that would not necessarily have the need or right to access information from other regions. The domain owners could create local policies that redact, null, mask, or otherwise obscure out-of-region data for users in order to maintain compliance.
3. Consistent Oversight
The most important part of governing a distributed data mesh ecosystem is the ability to maintain consistent, total oversight across domains. Think of it like a teacher who breaks a class into groups for a project: The students are given the materials to complete their work and the responsibility to follow through, but the teacher is still in the room watching over progress and helping to keep everyone on track.
The challenge of federated governance is finding the right balance of global and local policies, and making sure that they work as intended. Teams must avoid any risky gaps in coverage, but over-restriction becomes detrimental to the domain team’s self-service work and negates the intended effects of the data mesh.
This necessitates the ability to monitor data mesh activity and implement detection capabilities for anomalous behavior. By continuously monitoring policy enforcement at both the global and local level, organizations can ensure that their data mesh operates effectively without being an unnecessary risk to data privacy and security.
How to Apply Federated Data Mesh Governance
Effective application of federated computational governance in a data mesh has two main requirements:
Technological Support
Teams need to be enabled with platforms and tools that can make data policy creation, application, and oversight a streamlined and self-service process. Employing a data security platform can enable these data governance and monitoring needs through a suite of dynamic tools. Uniting sensitive data discovery and classification, easy-to-author attribute-based access controls, and data detection and monitoring in one self-service platform enables the application of these tools at both the global and local levels. Teams can create, enforce, and monitor any ecosystem-wide or domain-specific policies, update them as necessary, and keep watch over their effectiveness to identify gaps and remediate any issues.
Organizational Alignment
Ultimately, the data mesh is just as much an organizational framework as it is a technical architecture. Even with a data security platform offering the necessary technical capabilities, teams need to unite behind the data mesh mindset in order to ensure appropriate use and the benefits that follow. Organizations must identify internal data mesh champions to lead the charge of adoption, using their tools to demonstrate data mesh benefits to their peers. Combine this leadership with consistent cross-domain communication and regular global assessment, and an organization is better able to employ a successful and secure self-service data mesh architecture.
To learn more about how modern teams are applying their data mesh architectures, check out how Roche federated governance across distributed domains. If you’d like to learn more about the enabling role of a data security platform in a data mesh implementation, request a demo from an Immuta expert today.
Find out more.
Want to learn more about federating data mesh governance with a data security platform?
Request a Demo