GDPR and Machine Learning, a Story of Transparency.

2018-07-12 13:15:17.909324

Getting closer to May 2018 (although a silent extension is underway) we are all becoming more familiar with GDPR as the new regulatory framework that is imposed by the European Commission to all businesses operating Europe. Since Europe is a market of five hundred million of high disposable income individuals, this regulation impacts literally all digital entities around the world since it affects everyone that keeps or processes data from EU citizens (Article 3, Paragraph 1). The interesting thing is that this legal framework is accompanied by a huge penalty for non compliant entities that can reach 4% of the global revenue. Yes you read correct 4% of a all international revenues aggregated or 20M (Article 83, Paragraph 5); whichever is higher.

Now you understand why GDPR is literally the Y2K (millennium bug) of Β2Β marketing! What probably your newly appointed chief data officer (CDO) or your external consultant - who sees his audit pipeline going exponential - is a very interesting debate on machine learning model transparency, especially if part of your first party data processing and decision making is based on complex models (oracles).

Clauses that affect machine learning models

First things first, among all that clauses in GDPR there is a certain concept called "Right to Explanation". In plain English this means that all decisions on customer data - especially automated - are "contestable" which results in the company forced to explain:

what data points were used
explain how a decision was made (eg. show a related product)
make sure there is no discrimination taking place (Article 22 Paragraph 4) by processing or deducting data that are:

…personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation.

A very interesting overview through a legal perspective can be found in Goodman, B., & Flaxman, S. (2016). "European Union regulations on algorithmic decision-making and a “right to explanation,” 1–9. available in http://arxiv.org/abs/1606.08813

intro

In this paper there a very interesting discussion about certain aspects of the explanation of an algorithm and how this can be extended in machine learning use cases.

Type of Models

As data companies and 3rd data processing bodies we have at our disposal two approaches. Use complex models and try to interpret them or use transparent models to begin with.

Complex Models: Oracles such as recurrent (RNN) or deep neural nets that handle feature extraction and model training in a single step. Neural nets for natural language processing (NLP) and computer vision (CV) are very accurate, but many times are complex, expensive, overkill and hard to interpret. Bonus point in our approach is to use LIME, a very interesting project that uses perturbations, in order to find a visual approximation of a more complex model. Actually LIME is an acronym for local, interpretable, model-agnostic explanation, which is super cool. A self explanatory example from computer vision, taken from team's insightful blog post is displayed below. Obviously we identify the frog by matching it's head, rather than other parts (pixel groups) in the photo.

LIME

Transparent Models: Simple and interpretable models that have strong intuition build in them such as regression and linear models, decision trees. There is definitely a trade off between these two approaches, but in marketing use cases our configuration space is more low parametric with an underlying white noise. Therefore, simpler models besides being transparent they also have the agility of low opportunity cost.

Conclusion

Now you might ask yourself is this really an issue or I am just making this up to market Warply Engage decision making and reporting capabilities. Due to the fact that no national regulation authority has started auditing data policies (no precedent whatsoever) and the fact that authorities will be very reluctant during the first months of application, there is room for clarifications. In any case CDOs that plan ahead need to put together a long term data management strategy and a marketing stack that takes compliance very seriously. Drop us a line if you want to have a partner that takes compliance very seriously.

Articles

Meet the new brave world of deep linking and drive sales!

John Doxaras

2018-07-12 13:08:08.442236