Data Mesh Security Best Practices

The data mesh framework places data ownership in the hands of those who know it best — the domain-level experts. It allows those closest to the data to create and share data products independent of the data team, accelerating insights and innovation. But the approach also creates a unique set of data security challenges.

To gain its benefits without compromising data, organizations must implement data mesh security best practices. In this guide, we’ll explore what a data mesh is and how it works, the security challenges it presents, and a collection of best practices to mitigate risk.

What Is a Data Mesh?

Data mesh as a concept is based on decentralizing data ownership and giving domain experts the ability to independently develop and manage their own data products. A data mesh architecture organizes data into self-contained domains, with responsibility for data governance and quality resting with each domain team.

This modern approach to data systems creates an architecture that can accommodate hybrid and multi-cloud environments, and get data into the hands of the people who need it. At the same time, it solves issues created by data silos and governance challenges by reducing the risk of IT bottlenecks. Zhamak Dehghani is credited with coming up with the concept and coining the term “data mesh.”

Underpinning the data mesh system is an enterprise-wide set of data standards that ensure consistency, interoperability, and adherence to data security protocols. Most organizations using this approach maintain a domain-agnostic data infrastructure built on a central platform that provides data pipeline engines, data storage, and streaming infrastructure.

Key Principles of Data Mesh

The concept of the data mesh is built around several key principles or assumptions:

  • Innovation is driven by collaboration: To be truly data-driven, an organization must ensure decision makers have access to the data they need when they need it. Because no one knows the data better than those generating and using it, data owners are encouraged to share knowledge about their data with other teams who use it.
  • Domain teams are responsible for governance: Placing governance responsibilities on domain teams has two benefits. First, effective governance requires a thorough understanding of the data, which the domain teams have. Second, governance isn’t bottlenecked by a centralized approach, which can accelerate access.
  • Data is handled as a product: Domain teams manage their data as products, sharing it across lines of business or with other users through clearly-defined contracts that build in security standards.
  • Data access is self-service: The data infrastructure is created in such a way that teams can manage their data pipelines, provision data products, and operate autonomously, so data is quickly accessible to those who need it.

What Problems Does a Data Mesh Architecture Solve?

Traditional, highly centralized data management approaches have failed to keep pace with the increasing needs of diverse data users. Bottlenecks created by overwhelmed data teams, siloed data, and inconsistent data management practices reduce efficiency and increase costs. The data mesh’s decentralized approach to data ownership and governance mitigates these issues. Let’s take a deeper dive into the specific issues a data mesh addresses and how it can help businesses manage their data more effectively.

Data Quality and Availability

One of the fundamental principles of the data mesh is treating data as a product. Domain teams are responsible for developing, managing, and sharing their data products, while each business unit is responsible for serving, transforming, and sharing its domain data with other users within the organization. When the individuals who know their data best control it, data quality and availability tend to be higher than when data is managed by a centralized team of data specialists.

Timeliness of Data Access

The domain-oriented design of the data mesh frees data owners and users to use information in new ways. Rather than relying on an overwhelmed engineering or IT team to perform ad-hoc queries, the data mesh supports a self-service model that makes data more accessible. This approach cuts out the middleman, fostering innovation and creativity by removing the institutional barriers that prevent quick access to relevant data.

Cross-Team Collaboration

With data from all domains governed by a universal set of standards and dynamic policies, data can be freely shared between business units. Consistency in data formatting, governance, discoverability, and classification fosters the type of cross-domain collaboration that fosters innovation and growth. Replicating this data synergy using a traditional, monolithic approach to data management is all but impossible.

Data Mesh Security Challenges

The decentralization of data ownership and governance can make it difficult to establish and enforce a consistent set of global security policies. Organizations that choose to implement a data mesh architecture must address the security challenges that come with decentralization.

Visibility

This security challenge is not exclusive to the data mesh, but the highly distributed nature of the data mesh compounds it. With each domain owner collecting, processing, and using data independently, data monitoring and detection of unauthorized or unnecessary access is more difficult. Organizations must ensure they have full visibility into who has access to sensitive data, when they’re accessing it, and how they’re using it.

Data Access Control

Least privilege enforcement has become a priority for many companies that must legally ensure that only the right people are able to access the right data – and no more. Compliance with government and industry regulations around sensitive data depends on robust data access control. Distributed data ownership and cross-domain collaboration introduce additional challenges to controlling data access that should also be considered.

Maintaining Data Quality and Consistency

In order to tap the full potential of the data mesh, each data product must meet a minimum standard of quality. As product owners, each domain team is responsible for ensuring the data they manage is visible, accessible, and secure. Organizations must give each business unit the tools, training, and resources required to meet these quality standards. Providing access to a central data catalog can help data owners collect, organize, access, and enrich their metadata to support data discovery and governance standards.

Data Mesh Security Best Practices

To adequately address data mesh security challenges, organizations should implement the following five data mesh security best practices:

1. Inventory, Categorize, and Track Sensitive Data

Developing an accurate understanding of what sensitive data exists, how it’s being used, and where it is located is the cornerstone practice for securing your data mesh. One of the easiest ways to accomplish this is to use a data security platform that enables automatic sensitive data detection and generates standard classification and tagging conventions.

2. Centralize Data Access and Privacy Controls

By centralizing data access and privacy controls, enterprise-wide data access and security policies can be consistently enforced across multiple cloud data platforms and domain owners using modern, fine-grained, attribute-based access controls. Separating policies from individual platforms helps achieve scalability, while determining access based on multi-dimensional attributes – as opposed to static roles – provides flexibility and granularity. Data privacy controls such as dynamic data masking and anonymization also help teams scale access protection with techniques that can be centrally enforced in hybrid and multi-cloud environments.

3. Integrate Zero Trust Principles

A zero trust approach requires all data users to submit to a continuous process of authentication, authorization, and validation in order to maintain access to network resources. Sometimes known as perimeterless security, zero trust is an important component of data mesh security, protecting distributed data from internal and external threats.

4. Continuously Monitor and Track User Behavior

Monitoring and tracking how data is being accessed, modified, and deleted helps security teams spot inappropriate patterns of usage. Using logging and auditing tools, security teams can track user activity, detect anomalous behavior, and identify potential security threats, enabling proactive intervention and accelerating post-incident remediation efforts. A data security platform provides a centralized dashboard for monitoring data access and modification activities, tracking requests and access to data, reviewing audit logs, enabling user behavior analytics, and viewing policy changes across the entire data mesh.

5. Implement a Data Security Platform Easily Usable by Non-Technical Stakeholders

The most powerful data security platforms are the ones that don’t necessarily require technical expertise to use, because they accelerate workflows and enable collaboration. Features such as plain language policy authoring empower non-technical stakeholders to create and understand policies, and can help simplify the process of proving compliance. By involving non-technical users with domain-level data expertise, organizations can more effectively secure data access across the data mesh, prevent data leakage, and integrate zero trust principles into their entire data architecture.

Strengthening the Security of Your Data Mesh Architecture

The data mesh is a decentralized and flexible approach to data management that allows data owners and other users to access information quickly and efficiently. Although this approach brings with it some unique challenges, organizations that establish data mesh security best practices can leverage its many benefits while protecting their data.

Read Powering Your Data Mesh with Snowflake to learn the four key components of data mesh, how to manage each when rolling out a data mesh architecture, and how to harness Snowflake to turbocharge your data mesh.