Top 3 Lessons from Databricks’ 2022 Data+AI Summit

The world’s largest data and AI conference took over downtown San Francisco the last week of June at the Databricks Data+AI Summit. Thousands of data innovators converged at the event for four days of seminars, training sessions, and one-on-one networking opportunities. Whether you want to relive the experience or are catching up on what you might have missed, we’ve got you covered.

Our on-the-ground team compiled the most exciting and influential takeaways from keynote presentations, customer presentations, and our conversations with other attendees. Here are our three must-know learnings:

1. Scalable ML Depends on Data Security and Quality

Artificial intelligence (AI) and machine learning (ML) models are only as good as the data used to build them. Just as you can’t get to an unfamiliar destination with incomplete directions, you can’t rely on bad data to deliver accurate ML models.

Scaling ML is a priority for leaders across industries, which means the stakes are high for getting it right. From Instacart to Insitro, speakers at the Data+AI Summit emphasized the critical need for scalable ML to power real time, business-driving insights. Data security and data quality, however, are often at odds. Ensuring the proper access controls are in place may come at the expense of revoking or tightly restricting read and write privileges that are key to managing data quality.

AI pioneer Peter Norvig, author of Artificial Intelligence: A Modern Approach, argued in his keynote that while ML models are certainly dependent on data quality, wrangling and managing that data are actually the hardest parts of the process. This entails knowing what data you have and where it’s from, curating it for consumption and model integration, and monitoring its use. Integrating automated solutions that automate data discovery, security, and monitoring help simplify these complex processes. In turn, governance, risk, and compliance teams can rest assured that authorized users have read and write access to the appropriate data, allowing data users to build accurate, resilient ML models without sacrificing security.

2. Modern Data Challenges Require Modern Solutions

This direct quote, heard during Pumpjack Dataworks’ presentation on building a modern data stack with Immuta and Databricks, rang true as a theme throughout the entire Data+AI Summit. In addition to Pumpjack, leaders from Bayer, Coinbase, and more cited the need to move away from legacy tools and static processes, and toward flexible, cloud-forward solutions.

A driving force behind cloud migration and modernization journeys is the need to deliver value to both the business and consumers. In their speaking session, data leaders from TD Bank noted that their cloud investments are intended to accelerate data transformations and insights, and in turn positively impact both internal decision making and the customer experience. Similarly, Walgreens’ presenters explained how their mission of building a modern, neighborhood health destination and filling more than 825 million prescriptions annually is predicated on maintaining a single source of truth for data assets that avoids duplication while preserving security.

The strict regulations on both the financial services and healthcare industries mean that modern data stacks must have data security and privacy capabilities built in. Therefore, starting a cloud migration process with data security or ensuring that modernization efforts include dynamic access controls should be top of mind for data leaders. Approaches like attribute-based access control (ABAC) and dynamic data masking ensure that no data slips through the cracks, and avoid the need to spend time and productivity retroactively building access management and anonymization processes.

“What sets [Immuta] apart from the rest is the attribute-based access control, which is the game changer for us. It made everything so simple. Anyone that’s not using it…you’re missing out.”

Corey Zwart Head of Engineering, Pumpjack Dataworks

3. Unity Catalog and Immuta: Better Together

One of the most exciting announcements at the Data+AI Summit was the general availability release of Unity Catalog for AWS and Azure. Unity Catalog adds a unified governance layer to the Databricks Lakehouse platform, enhancing its ability to break down silos and enable seamless, secure data accessibility and collaboration.

Our team at Immuta has been working alongside Databricks on the development of Unity Catalog, and are equally excited about its GA release. As the first Databricks Security Partner to integrate with Unity Catalog, Immuta adds a layer of fine-grained access control that bolsters Unity Catalog’s foundational governance and lineage capabilities. Additionally, Immuta’s sensitive data discovery and classification, scalable enterprise policy orchestration, intent-based policies, and advanced and data use monitoring features mean that joint customers can bypass common access control management challenges.

“The goal of our Unity Catalog is to make it easier for our customers to discover, audit and govern data assets in one place to help meet their compliance and privacy needs,” said Jonathan Keller, Senior Director of Product Management at Databricks. “Immuta’s automated and secure data policy engine is a key piece of this data governance puzzle, and we are thrilled to be joining forces to help in our effort of simplifying the process of securing and governing data and AI assets across multiple clouds.”

With Immuta acting as a centralized policy management and orchestration engine, and Unity Catalog solving for consistent enforcement across all data and AI assets, users are able to scale secure data use across the Databricks Lakehouse platform and reduce policy burden by 75x.

“Immuta will take advantage of Databricks’ Unity Catalog to ensure that our customers get a single source of truth to manage metadata, lineage, and access controls. I think it’s a fantastic simplified experience for our customers.”

Raja Perumal Senior Alliance Manager, Databricks

What's Next?

It’s impossible to capture all the lessons and insights from the 2022 Data+AI Summit in one article, but these three widely acknowledged themes had attendees buzzing throughout the week. Ultimately, the future of data use relies on innovative technologies that can deliver high performance and insights in real time. None of this is possible, though, unless data security is accounted for from the start. With Immuta’s Unity Catalog integration, Databricks users have never been in a better position to analyze, share, and unlock more data, while ensuring that data remains protected.

Want to see just how easy it is to get started? Check out our self-guided demo now.

Build Your Own Databricks Policy

Try our self-guided demo

Related stories