For Dave DeWalt, Founder and Managing Director of advisory firm (and Immuta investor) NightDragon, the modern data stack–like dragons–has a compelling dual nature. Balancing power with danger, both evolving data stacks and dragons contain immense potential strength that can (quite literally) go up in flames if not managed properly.
DeWalt, John Cordo, Principal at NightDragon, Mike Holmberg, Data Privacy Tech Leader at HP, and Matthew Carroll, CEO & Co-Founder of Immuta discussed this concept in their webinar Taming the Data Dragon: Why Managing Access Is Critical for the Future of Data Use. In this blog, we’ll define the modern data stack, highlight these data-driven leaders’ three key insights about addressing prevalent challenges, and explore how to make the power of modern data work for you.
What is the Modern Data Stack?
Over time, the platforms and tools available for data collection, storage, and analysis have evolved significantly. Legacy data stacks were traditionally composed of a select amount of on-premises tools with distinct data storage and processing functions. Due to their static nature and somewhat limited processing power, these tools alone have been unable to scale alongside the overall evolution of data use and resources.
To facilitate this need to scale, modern data stacks involve a wider variety of tools. The modern data stack is made up of a combination of on-premises and cloud-based software tools used to gather, store, process, and analyze data resources. By incorporating cloud-based platforms into the data stack, organizations gain the capacity to scale more effectively as they grow their data, users, and use cases.
Typical Components of the Modern Data Stack
While use case and industry variety means that there’s no standard blueprint for the modern data stack, there are a handful of common components that can be found across most organizations. These components include:
- Data Collection: These are tools meant to research and collect data from a vast array of sources, from users and customers to existing data sets and beyond. Examples of data collection tools include Salesforce, Hubspot, and Zendesk.
- Data Integration & Transformation: These tools are used to ingest, consolidate, clean, and transfer data from its original raw state to something accessible and consumable by data users. Examples of integration and transformation tools include dbt, Fivetran, and Talend.
- Data Storage: These tools provide a location for ingested and transformed data to be stored and accessed for a variety of use cases. Examples of data storage platforms include Snowflake, Databricks, Starburst, Google BigQuery, and Amazon Redshift.
- Business Intelligence (BI) & Analytics: These tools are meant to facilitate the efficient use of data resources by data users within the ecosystem, driving business objectives and enhanced insights. Examples of BI & analytics tools include Tableau, PowerBI, and Looker.
Common Challenges of the Modern Data Stack
Most data stack challenges are rooted in the sheer volume of data involved in modern use and analytics. During the advent of data use, information was collected and stored in smaller quantities and often kept separate from other technology by firewalls and other defensive mechanisms. “Now we’ve just had this massive explosion of data,” noted DeWalt. Holmberg echoed his sentiment that there is simply more data now than ever before.
On top of the growing amount of data, the hybrid nature of the modern data stack broadens the locations in which data is being stored, accessed, and utilized. This comes from the increasingly common migration to the cloud, as the promises of cloud storage and computing influence more organizations to make the move. In addition, the “work from anywhere” model of business spurred by the Covid-19 pandemic means that data is accessed from more locations than ever before. Add in the growing number of data privacy regulations impacting data use, and you’ve got a troublesome stew of sensitive information, users who need to access it, and laws that govern its utilization.
The combination of extensive data points and burgeoning technologies creates a situation where analytical needs are outpacing traditional approaches to managing data access. Carroll highlighted this challenge, noting that “these are nascent technologies that are being adopted at rates that we didn’t see in any [other] time in technology and computing infrastructure.” This is all occurring with “no real cohesive single standard [for] data privacy” according to DeWalt, which presents a massive challenge for data teams who need to respond to increasing pressure from both data users and regulatory requirements. Describing this growing concern, Carroll noted that “more users are able to access more data than ever before, and…typically you have controls in place. We just don’t have those controls yet.”
This is the ultimate risk associated with the modern data stack. The overwhelming shift to cloud-based data storage and analytics is surpassing the ability for data access to be properly controlled. As the attack surface for malignant actors widens, organizations must work to ensure that their data is being accessed efficiently – without compromising its security.
Facing the Challenges of the Modern Data Stack
With such immediate risk surrounding the future of data and obscuring the abundant benefits of leveraging data resources, what steps can be taken to effectively address these concerns? Here’s what the experts had to say:
1. Separate Policy from Platform
The number of cloud data platform providers is only going to continue growing and diversifying. As organizations adopt multiple technologies to best suit their needs, there remains one constant: data must be governed wherever it lives. If data access control policies are created and maintained in each of these individual platforms, then the data will not be subject to a consistent standard of protection.
Rather than attaching policies to individual platforms, data teams must implement tools that allow for consistent policy authoring and enforcement across their data ecosystems. Speaking to this necessity, Carroll emphasized that “in this age of modern data security and privacy…you need to be able to separate the policy from those platforms in order to dynamically control [the data] at scale.” To elaborate on this point, he laid out his “three key pieces of modern data security in cloud data infrastructure,” spotlighting the importance of:
- Separating data rather than keeping it all in one bucket
- Making data de-identification a standard practice
- Building and implementing next-gen monitoring capabilities
Keeping policy separate from the platform enhances modern data stacks’ versatility and allows data teams to apply universal policies and control access wherever the data travels.
2. Make Data Security a Priority from The Start
“If you don’t have security, you do not get privacy,” expressed Holmberg. The sheer variability of data use in modern environments adds a complex layer to data security. Addressing this complexity, Holmberg noted that “we seem to invent new uses of data [that] have not only jurisdictional variation, but then customer variation [and] product entitlement variation,” almost constantly. Continuously identifying new uses for enterprise data generates a complex data landscape that, while useful for business objectives, still needs to be effectively secured.
While this variability cannot be predicted, it can be accounted for as organizations build their modern data stacks. As data teams take the “lift and shift” approach to cloud migration, security measures should be baked in from the start. DeWalt describes this mentality as “designing in” security capabilities, asking “as we lift and shift that data, are we designing in our access rights [and] policies…to really come together as a team to manage and reduce risk?” Scalable security and privacy methods must be built into the foundation of any modern data stack.
Proactively determining access control and security policies allows teams to avert risk rather than reactively dealing with it. Holmberg referenced HP’s cloud migration story to describe their approach to “designing in” security while building their data stack.
“That is the heart of privacy by design,” Holmberg claimed, “saying ‘Ok, we’re building something new, how do we design this in and make these capabilities fundamental at the start?’”
By asking these questions, and finding the tools and techniques to make foundational security possible, organizations can ensure that their modern data stack is secure and risk-averse.
3. Structure Your Data Teams for Success
While tools and technologies are the organs that keep the modern data stack operating, they can’t be implemented without the right people. Taking on the multifaceted challenge of managing the flow of data and ensuring secure data use requires teams to be organized and aligned towards the right goals.
“It all starts with org structure,” said Carroll. “Scale starts with people [who] are put into a position where their responsibility is ‘How are we going to scale our data?’”
Automation and proper tooling can do wonders, but it is the people making data access control decisions that have the last word in scaling organizational data success. Roche, a Swiss healthcare company with global operations, implemented such an organizational structure in order to reach its data-driven objectives. Carroll noted that Roche added data product managers into each business line to focus specifically on the company’s data assets. These managers are explicitly responsible for data utilization and security, working internally with a range of relevant players (CSOs, CISOs, etc.) to keep data viable and safe.
“I think people underestimate the amount of people that need to be working together to…execute a program to not only deliver value of data, but oversee the integrity of it and make sure it’s protected,” remarked Carroll. By creating roles that solve for this complexity, as well as the intricacy of the organization’s data architecture, teams can strengthen their personnel and manage the modern data stack confidently.
Achieving Success with the Modern Data Stack
“This is not a roadblock, we’re actually the roadbuilders,” said Holmberg on building security into the data stack. “We’re trying to pave the highway [so] you can go on as fast as you want, we just don’t want you to go off-road.”
While the multifarious challenges of increasing data use cases, fast-moving technologies, and widespread data accessibility are unavoidable, they can be tamed with a proactive approach to data security and privacy. By separating policy from platform, proactively “designing in” security for data stacks, and intentionally structuring personnel, you can avert the dangers of the modern data stack and safely optimize the power of data.
Choosing the right technologies to achieve these goals is also key to success. The Immuta Data Security Platform allows users to separate policy creation and orchestration from platform and write comprehensive plain-language access policies that are automatically applied across an entire data ecosystem. These policies can be written, maintained, and understood by any stakeholder, so that personnel throughout the organization have the power and visibility they need to protect data assets. With dynamic attribute-based access controls, these policies can scale and maintain security as new platforms, users, and data are added to the modern data stack.
To see how simple policy creation and implementation can be, try our self-guided walkthrough demo today. And if you’d like to hear more from these data-centric experts, you can watch the full webinar here.