Automate HIPAA De-identification Methods on Amazon RDS

Data engineers and product managers are often responsible for the implementation of various controls and audit capabilities when managing healthcare data. To enable faster, data-driven innovation, these data professionals — particularly those who come to health care from other industries like tech or financial services — apply best practices such as deploying a proven data analytics stack and being compliant with a set of financial regulations. . 

But then the reality sets in: healthcare data platforms – without proper access control and security – may be responsible for re-identification of individuals from patient data. Data engineers and PMs may be personally liable for any re-identification. 

This tutorial is written for those innovators new to healthcare that want to implement modern data architectures in a legal and compliant way, while building trust with their compliance team. The goal is to enable speed of innovation, without sacrificing security and governance.

Let’s start with understanding what HIPAA Safe Harbor requires:

  • 18 direct identifiers are removed from data sources.
  • Data Owners do not have actual knowledge that Data Users could re-identify individuals.

This tutorial will demonstrate using Immuta’s automated policies and auditing for The Safe Harbor method to de-identify a data set stored in Amazon RDS for PostgreSQL. This Policy is enforced on-read, without the requirement to copy or move any data. The same automation steps apply for other data sources on Amazon such as Athena, MySQL, EMR, S3, Databricks, Microsoft SQL Server, Redshift and others supported by Immuta.

Sample Healthcare Data Set Loaded into AWS RDS for PostgreSQL

The public data set from SMRT Columbus includes licensed healthcare facilities in the state of Ohio, and has been loaded into AWS RDS for PostgreSQL. Since it’s a public data set anyone can access, it does NOT contain patient data for purposes of the article, but has sensitive identifiers that would otherwise be classified as protected health information (PHI). If this were patient data being used for analytics, it would need to be de-identified per the section on uses and disclosures of PHI, 45 CFR §164.514(a)-(b) such as names, geographic subdivisions smaller than a state, dates that are directly related to an individual, phone numbers and email.


Register the Amazon RDS Data Set with Immuta

From the Immuta console, click on the data sources icon on the left and click, + New Data Source, to create a new PostgreSQL connection and select the table “OHIO_LICENSED_HEALTHCARE_FACILITIES.” No data is ever stored in Immuta since this is a logical table. 


Sensitive Fields are Automatically Discovered and Tagged

After setting up the Amazon RDS for PostgreSQL data source, Immuta will discover sensitive information in the data set. Sensitive Data Discovery is a capability built into Immuta that discovers and tags sensitive fields such as names, dates and geographic locations in the screenshot.


HIPAA Safe Harbor Global Policy

From the policies tab, you will find the policy is available by default and moved from staged to active state at a global scope. The rules in the policy are enforced based on the tagged fields and displayed in plain english so you can show your compliance team for full transparency. Immuta includes a natural language policy builder if you need to create additional policies specific to your organization or other regulation.


Certify the Policy against Amazon RDS for PostgreSQL

Navigate to the data source, “OHIO_LICENSED_HEALTHCARE_FACILITIES”, and certify that all 18 identifiers are properly tagged and that you have no knowledge that the data set can be used to identify individuals by clicking “Sign and Certify”.  If you prefer to use an external data catalog, those tags can also be integrated into Immuta.


Business analysts need to provide acknowledgement

Let’s change hats from data architecture to the business analyst that wants to consume data. When a business analyst requests access to the project that contains the data source “OHIO_LICENSED_HEALTHCARE_FACILITIES”, that person must agree to use the data set for the stated purpose of the project, refrain from sharing data outside of that project and not re-identify or take any steps to re-identify individual health information.This combination of steps serve as official acknowledgements of complying with the HIPAA Safe Harbor policy.


Access the De-identified Data Set

The business analyst has agreed to not attempt to re-identify individual health information in the previous step. But for the sake of this article, let’s assume he or she does not read anything upon clicking, and then tries to identify personally identifiable information at medical facilities in Columbus, Ohio with “CL” code whose effective date was 4/1/19 based on an article from the Columbus Dispatch. Re-identification would be attempted by exporting data to the Tableau desktop and adding [effectivedate] to the details filter by the date. But the policy has masked the effective date to mitigate risk from re-identification that was enforced technically by the automated global policy when drilling into the data.

Note that this data is being accessed with HIPAA Safe Harbor policies being enforced on-read, without copying any data. The identifiers in the data set remain in the database if access is required for different purposes.


Prove Compliance with Audit Reports/Logs

When your compliance team or an auditor needs to understand more about the interaction, the audit log and reporting capabilities provide instant evidence of compliance with the Safe Harbor policy. The extract query from Tableau was logged at 23 Apr 2020 11:33:06 -0400 under the purpose of “Re-identification Prohibited”. If you click into the log, you can see the user details and understand what data was accessed and policies applied for de-identification.


Additional Resources

The prebuilt Safe Harbor data policy demonstrated in this article was developed in collaboration between our legal and software engineering teams. Legal engineers are lawyers with deep expertise in regulations that track regulations and map them to Immuta capabilities to manage impacts on data architects and data engineers managing data. In contrast, I don’t claim to be an expert on healthcare regulation and you may catch me spelling HIPPA, rather than HIPAA. This combination (sans spelling mistakes) is increasingly common as the healthcare industry continues to innovate.

The HIPAA Safe Harbor Global Policy can also be used to protect data sets on the AWS Data Exchange.

If you find value in this approach, you can request a demo:

Ready to get started?

Request a Demo

Or if you are interested in learning more about the automated HIPAA Safe Harbor Policy available in Immuta, check out the documentation. If interested in learning more about how we can help with HIPAA Expert Determination, please contact us.