Article

4 Privacy, Risk, AI, and Machine Learning Predictions for 2019 from Immuta CEO Matthew Carroll

Data Privacy Legislation in the U.S. Will Become a Real Conversation

The run-up to the 2020 U.S. presidential election will dominate news cycles in 2019. While residual issues from 2016, such as the manipulation of media, the use of propaganda on social channels, and the influence of third parties on our politics, will be central themes, privacy will emerge as a critical theme that candidates need to discuss.

Since the enactment of the General Data Protection Regulation (GDPR), there has been a greater push for federal data privacy legislation in the U.S.  Mainstream discussions of this topic will filter into the world of technology and could divide Big IT. The stage has been set for this debate Apple and IBM have publicly taken a “privacy first” approach, while other big tech companies have remained silent on the subject.

The reality is that for the Global 2000, the E.U.’s GDPR is very real, and U.S. legislation will merely be an extension of its existing governance, risk, and compliance (GRC) efforts. This conversation around privacy will force the line of business to work with GRC to engage consumers about what data is being used, and for what purpose, even outside of European operations. It will force businesses to truly think about data residency as a line a business issue as much as any other key business dependency.

Managing Risk and Building Trust Within AI/ML Will Become a Requirement, Not Simply an Interest

As Artificial Intelligence (AI) and Machine Learning (ML) investments mature, and as data science becomes a democratized function across the enterprise, the number of models in production will increase exponentially. This growth will make it nearly impossible for manual, ad hoc governance and control methods used in the past.

Instead, automated systems and processes will be put into place to manage the risk of AI/ML. This systematized, automated approach to scaling governance of AI/ML – which doesn’t exist today – will be a key focus for large enterprises in 2019. These enterprises will rely on machine-based processes to manage trust between the data scientists, the models they create, and the business units that rely on the models.

Trust will be the core need for every team that interacts with these models – from the line of business, to governance, risk and compliance personnel, to the data scientists themselves. Each group will need to be able to trust the data used by the models (encompassing issues of data quality and integrity); to trust that the policies on the features within the data are correct and that regulations are followed and audited for compliance; to be comfortable with the level of transparency and explainability of the model type; to be able to grasp whether the outcome of the model is good or bad; and finally, each must know when something has significantly changed in the model or the underlying data that could require the model to be pulled from production or retrained.

The Line of Business Will Take Back the Reins

Since the beginning of the Big Data era, circa 2009, IT has owned most critical enterprise IT programs. This was in part due to guidance to migrate disparate data stove-piped in the line of business, and to move that data into consolidated data lakes to ease the burden of data access.

The idea was that these moves would provide standardized data management features across all lines of business. And the reality was that IT did not need domain expertise in every dataset, rather the focus was to enable rapid app development where software engineers would build features driven by users with domain knowledge.

But unlike apps, where requirements can be built over months, data science is much faster, and requirements are typically fuzzier. This means that IT can no longer own the data and the IT programs like it once did. Instead, the line of business will need to take back the reigns and become more involved.

Cyber Attacks on Enterprise AI/ML Become Real

As enterprises scale their machine intelligence efforts in 2019, so will adversaries seeking to exploit analytics as an attack vector. Unlike the majority of ransomware and malware attacks we typically see, these attacks will not focus on stealing or ransoming data.  This new breed of attack will try to exploit model behavior either by finding adversarial examples, or alter the data by gaining influence over the training set.

This later class of attacks, collectively known as poisoning attacks, occurs when adversaries leave specially crafted data in places where could be ingested during model training. These won’t be errant (e.g. misclassified) samples or injected “out of band” by bypassing security mechanisms. Instead, these attacks may occur when an adversary, perhaps acting as a data subject, alters their behavior to provide a large number of carefully chosen training samples through regular channels. The accumulation of enough poisoned samples enables an adversary to amplify and shape existing model failure modes for subsequent exploitation in production.

Mitigating these kinds of attacks requires a deep understanding of the failure modes of your model. Imagine the ability to intelligently generate data to perturb a model, map features the model weights as part of its decision-process and receive human interpretable feedback. This will be a reality in 2019. More specifically, initiatives like Open Banking in Europe provide validation sets for nefarious actors to determine if they influenced financial decisions in an open forum.

Enterprise AI, which is still in its nascence, is simply not ready for such sophisticated attacks.  Systemization of proper ML governance, and of detecting and measuring anomalous data activity for model management, will become a major topic by the end of the year.