SXSW, Big Data, and the Law

On Saturday, I had the privilege to give a talk on how software is eating the law at SXSW. And I thought I’d take to the blog to share a few thoughts on my time in Austin and some of my conversations there. Specifically, what struck me is how many organizations are confronting variants of the same problem. That is, a few years after the term “big data” has hit its apogee, many enterprises are left wondering: where is all the value?

The fact is that data science teams face a whole host of deep challenges, many of which are not immediately visible to folks in the C-suite, and these challenges are holding organizations back when it comes to their data science programs.

To be successful, for example, data science teams need to tackle a sprawling web of challenges. They need to confront legacy architecture issues—figuring out where all the data in the organization is, who owns and maintains it, and what format it’s in. They need to solve for these issues by building or finding tools that let them get their hands on that data in the first place. And then, perhaps the most difficult task of all, they need to convince governance personnel that everything they want to use that data for is fully compliant with internal and external rules and policies. And this interaction with governance teams isn’t simply one conversation, just to get them started; it’s ongoing, day after day. Neither team seems to be fully equipped to understand the other’s problems.

How many data science teams, for example, fully understand the mandates of European data regulations? Or the implications that breaking those regulations can have upon a company? And how many lawyers fully understand the difficulties in preserving data provenance when training a machine learning model? Or in retraining that model when the data or the rules change?

The fact is that the objectives of data science teams and governance teams aren’t just misaligned in today’s enterprises—they are frequently directly at odds with one another.

Success to data science teams means turning every chunk of available data into value; success to lawyers, on the other hand, means maintaining full control and transparency into how and why that data is being used. And in a world where software itself—and AI in particular—is increasingly opaque, maintaining this control is increasingly complicated.

Which brings me to the themes of my talk, about how law and technology are going to overlap in technical ways to solve some of the most complex issues facing the tech industry. When I look at the regulatory issues facing the tech industry, I see a real opportunity to embed laws within software. And lest you think that compliance and data science don’t mix well at a technical level, Pennsylvania State University researcher Nicolas Papernot has been doing some fascinating research that shows how privacy-preserving machine learning models can actually have greater predictive accuracy than models that don’t control for privacy. (To overly simplify: the less a model is affected by one entry, the more accurate its predictions are going to be in general). And when it comes to the importance of governance itself at a larger level, economists have demonstrated that companies with serious governance programs actually perform better in the markets.

The fact is that the cultural barriers between data science teams and lawyers are real, and they appear to be growing. The good news is that organizations seem to be waking up to this challenge, as was evident in conversation after conversation at SXSW. And the faster lawyers and data scientists can understand each other’s concerns and issues, and solve for them, the faster both teams can make all the benefits of big data real.

Update: SXSW just released audio of the talk. Click here to listen!