About the organization
This health data nonprofit is focused on accelerating research in which new and novel data can provide valuable insights and open doors for ethical public problem-solving.
The organization was initially established in response to the COVID-19 pandemic and has evolved to tackle other critical policy issues through data. It gathers commercial, donated, and public-access data sets for analysis by its volunteer corps of data scientists and academic researchers.
With access to this data, different classes of researchers are able to collaborate to produce replicable, modular research that can in turn be published in open-access journals and be used to drive social impact and nonpartisan policy work.
As we add more components in a cloud database environment, it’s much cleaner than any sort of on-prem situation we’ve had before. The ability for us to manage access controls, deploy privacy enhancing technologies, and rapidly implement novel frameworks of governance for our research teams has been a breath of fresh air, with no management or overhead costs for adding additional cloud database solutions.
The organization’s mission of enabling ethical, public problem-solving is predicated upon its ability to provide researchers with new and novel data sets — most of which contain highly sensitive data. For example, under its research program, the nonprofit utilized a geolocation data set derived from 50 million Americans’ mobile phones during the 2020 Thanksgiving holiday to understand the prevalence of social distancing during the holiday in an effort to control the spread of the virus.
This and other sensitive data, however, can be easily reidentified — an important concern for the nonprofit due to legal, ethical, and contractual guidelines that protect against reidentification. It was critical for the organization to find a way to balance data privacy and utility through a mix of data controls, such as masking, and context controls, such as role- and attribute-based access controls.
A key component of the nonprofit’s approach to research and problem-solving is bringing together diverse contributors from interdisciplinary backgrounds. Consequently, it recruits data science and data engineering volunteers from various industries and facilitates their collaboration with academic social science and economics researchers from universities and think tanks.
Given this diverse and ever-changing group of contributors, the nonprofit faces the complexity of needing to share sensitive data stored in a compartmentalized data lake (using Snowflake as the primary datastore) to many different researchers — all of whom need access to different data for varying purposes through the analytics environment. The process involved reviewing each research proposal and then creating new native views and roles in Snowflake to govern access to sensitive data for each research project. This approach quickly became too cumbersome and complicated, particularly in the context of a fast-spreading virus for which real-time data was paramount.
With limited resources, it was also critical for the process to be easily managed by the data engineers, who set up the analytics environments for the researchers.
To deliver on its mission, the non-profit partnered with Immuta to build an data security layer that simplifies data sharing while preserving both data privacy and utility.
When researchers want to begin a new project, they can discover data sets with Immuta’s automated data discovery and classification feature. Self-service workflows enable researchers to request access to data, acknowledge approved usage purposes, request access control changes, and propose new projects.
This approach exponentially simplifies the data request process for the data engineering team. New researchers now have instant access to data but only to the data required to work on their problem scope — reducing time to utilize data from months to hours.
In addition to removing barriers to data access, the nonprofit leverages Immuta to automate the enforcement of fine-grained, attribute-based access controls (ABAC) and privacy enhancing technologies (PETs) on the data stored in Snowflake. Instead of relying on Snowflake controls, nontechnical data governance experts can define robust access policies in Immuta. These policies are then natively enforced when users interact with data in Snowflake.
This new approach greatly simplifies the process of preparing and sharing sensitive data for analytics, eliminating the complexity of creating roles for every researcher and building secure views and complex, role-based functions to implement advanced PETs like format-preserving masking or dynamic k-anonymization.
By implementing Immuta, the nonprofit was able to:
- Save more than $1 million annually in data engineering costs
- Simplify the data request process for researchers, reducing the time to data by 30x, from 90 days to 3 days
- Eliminate the need to manage hundreds of individual data access policies by moving from RBAC to ABAC
- Reduce the number of access control policies to fewer than 10, which can be easily authored by non-technical data governance experts