If you work in the field of Artificial Intelligence (AI), you might wonder "Why should I care about the GDPR? What does it even have to do with AI?"
Great questions! There are a lot of GDPR articles, which are relevant for AI. In this blog post, I like to discuss one specific Article, which gets a lot of attention and also leads to a lot of confusion/misunderstanding, I find. And this is — drum roll, please — Article 22 “Automated individual decision-making, including profiling”.
The confusion around Article 22 is — in my opinion — due to the fact that (i) the GDPR is new for most (AI) people, and (ii) the GDPR is not something AI people (or most people for that matter) work with on a daily basis, nor do they have a background in this. Which leads to (mis-leading) statements like “Now that we have the GDPR, we can’t do Machine Learning anymore!” or “We can’t use any personal data anymore ‘thanks’ to Europeans!”. It is certainly not that black-white, but the devil is in the details…
As I indicated in my previous blog post, this will be one of my longer blog posts. Simply because of the topic discussed. So, please, bear with me; it will be worthwhile (I hope)!
Before I get into it, a couple of remarks:
- I want to explicitly state that we at a3i are not legal experts in GDPR (or in any legislation for that matter). However, the GDPR is, albeit complex, actually a readable document (I find). Besides, there are some very good articles out there, which discuss the GDPR articles in more detail and put it in context. A good example is the Article 29 Working Party.
- Despite (i) the fact that the GDPR has been years in the making and (ii) is also (heavily) based on the previous EU (European Union) Data Protection Directive 95/46/EC, it still is new legislation. I fully expect that in the next few years, there will be (lots) of discussions and jurisprudence on the applicability and interpretation of some GDPR articles. And we already see this with the establishment of the previously mentioned Article 29 Working Party. This group has issued guidelines (and will continue to do so) to help better interpret/understand some of the GDPR articles. For instance, they have issued guidelines for 'Automated decision-making and profiling’, ‘Transparency’, and ‘Consent'. As such, this blog post is a snap-shot in time and the content may be subject to change.
OK, let's get started and first summarize what the GDPR is about.
GDPR in a nutshell
The GDPR stands for the General Data Protection Regulation and is the new legislation regarding the data protection of EU citizens. It replaced the previous EU Data Directive 95/46/EC with the intent to establish one single, consistent set of rules across Europe. The GDPR is mandatory (hence regulation), while the previous Directive was voluntary and EU countries could (and did) deviate/limit/enhance the Directive. It is also important to note that the GDPR’s scope is not limited to the EU countries per se. If your organization is outside the EU and collects/accesses data concerning EU citizens, the GDPR may apply, regardless of where you are located and depending on the context.
Although readable, the GDPR is also a long document: 200+ pages, 99 articles grouped in 11 chapters and 173 recitals. A recital is an explanation of an article or set of articles. However, recitals are not legally binding. They are more meant as ‘background explanation’ or to put the article ‘in perspective’. It may be possible that — over time — some text in a recital will become part of an article, making it legally binding.
“Hello Article 22! What are you all about?”
Article 22 probably takes the grand price for the attention in the AI world because:
- Confusion and potential disagreement on what this Article 22 exactly means. As it is certainly not a black-white situation, given exceptions and ‘depending on the circumstance’ type of scenarios.
- Our society has become data-driven and automated individual decision-making is becoming main stream (e.g., in the financial sector, healthcare, taxation, and insurance). As such, the GDPR might or might not limit your AI system on what you can or have to do when you are ‘in the business of automated individual decision making, including profiling’
- Profiling takes centre stage in, for instance, Marketing (e.g., market segmentation), Justice System (e.g., recidivism), Online & Social Media (e.g., recommender systems in Netflix and Amazon, news feeds in Facebook and Twitter or the fact that your Google search results are highly tuned to your profile and search history). Same here: the GDPR may or may not limit your AI system.
I break it down in different parts for a better understanding. Let’s first start with a few words on ‘profiling’ and ‘automated decision-making’, as those are key concepts in Article 22.
I am pretty sure you have an idea what profiling is — at least from your perspective. But the GDPR defines profiling in Article 4(4) as “any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person, in particular to analyze or predict aspects concerning that natural person’s performance at work, economic situation, health, personal preferences, interests, reliability, behaviour, location or movements.”
Quite a mouth full, but it is essentially composed of three elements:
- It has to be an automated form of processing;
- It has to be carried out on personal data; and
- The objective of the profiling must be to evaluate personal aspects about a natural person.
In our data-driven society, we are profiled in any shape or form. For the better or worse; with or without your knowledge and/or consent; every single day. Granted: this applies more in certain industries or social/online media than others, but still. Profiling has become part of our lives.
If you read Article 4(4) word for word, you notice that it states“any form of automated processing….” The Article does not state ‘solely automated processing’, which is the ability to make decisions by technological means without human involvement. This distinction is important for Article 22. I will touch on this later.
For now, if we look at ‘automation’ and ‘profiling’, we can have profiling with ‘some (small) form of automation’, with ‘solely (‘full’) automation’ or anything in between. And vice versa, we can have automated decision-making with or without profiling.
However, it is because of the automation we have at our disposal nowadays that we can do a very accurate job of profiling. Thanks to the shear amount of (big) data, fast compute power, massive amount of storage capabilities, and smart AI algorithms/models. As a matter of fact, without fully automation, today’s profiling is simply not feasible or sub-optimal at best.
The human involvement are actually the AI specialists (e.g., Machine Learning engineers, software programmers, data scientists), who collectively build the profiling (and other) AI systems. After that, the profiling pretty much runs ‘on its own’ (i.e., fully automated). Humans will still be involved in that case (e.g., for the purpose of quality control, legal issues, and optimization of the system). But the actual flow of ‘Data In —> Analytics —> Profile Out’ is virtually 100% automated. And even if we want to involve humans as part of the profiling process, some of the AI systems (e.g., deep learning networks) have become so complex (e.g., number of profiling parameters involved and all its interactions) that it is virtually impossible for humans to get their head around it. We might be able to explain how the system was build (e.g., type of neural network, initial parameters, training/test/validation set, regularization techniques, and so on). But we can’t (always) explain the profiling results the system generates.
Whether we can explain or not, depends on the AI system used. For instance, if we look at the more ‘simple’ AI systems, like linear or logistic regression (which is a classifier), humans can still be very much in the loop. To the point, where they can even explain the system and its results. For instance, when you apply for a mortgage and the bank uses logistic regression to put you in a certain profile (e.g., ‘High income, very good credit score’). Here we have two choices: (i) the system makes the choice whether you are awarded the mortgage and under which conditions or (ii) the system proposes an advice to the financial advisor and he/she will ultimately decide. In case of the former, at least we can probably explain why the system made that choice (as we use a ‘simple’ AI system), which can be explained. In case of the latter, the financial advisor may incorporate other data items, which are not part of the AI system, in order to make the decision. Which in its turn may lead to another very interesting and challenging topic in AI systems, which is bias in data and decision making. But that is for a different blog post.
OK, time to break down Article 22…
The first clause of Article 22 [denoted as ’22(1)’] states “The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.” [italics added by me].
A data subject is any person whose personal data is being collected, held or processed by a data controller. A data controller is the organization or individual who determines what happens with personal data. There is also the data processor, which is the organization or person who processes data on behalf of the controller. The controller and processor can be the same organization/person or different.
At first, this reads as if Article 22(1) in general prohibits decision-making, if this is solely based on automation. It is important to note that this prohibition applies whether or not the data subject takes an action (e.g., not providing consent or asking for an explanation). As such, you might initially think that this takes AI systems into murky waters… And it may, but (i) only in specific circumstances and/or (ii) when you don’t pay attention to this and/or simply don’t realize this. It is important to understand that 22(1) only applies when a decision — based on solely automation (including profiling) — has (i) a legal effect or (ii) similarly significantly affects someone. And even in those cases, exceptions are defined, which does allow such processing to take place.
As we read earlier, solely automation means there is no human involvement in the decision process. The (immediate) question, which comes to mind is: “Can we avoid Article 22 by just fabricating ‘some kind of human involvement’?”. Like “An employee routinely applies, without further influence nor evaluation, automatically generated profiles to individuals by just clicking on this button for every individual.” Bad luck, as according to the GDPR, this would just be a token gesture, and as such, the process would still be considered solely automation.
To be qualified as human involvement, you (the data controller) must ensure that any oversight of the decision is meaningful. It should be carried out by someone who has the authority and competence to change the decision.
And this is where the rub might be, as the question becomes “To what extent am I required to explain my AI system? Is it enough if I can explain ‘just’ the logic and the setup of the system or am I required to explain its results in detail as well?”. As explained earlier, depending on the AI system used, humans cannot possibly explain (certain) outcomes/results anymore…
The AI research community is starting to take notice. Not so much because of the GDPR, but because of explainability and transparency of AI models in general. Examples of research are Local-Interpretable-Model-agnostic Explanations (LIME), and Generalized Additive Models (GAM).
Legal or similarly significant effects
Despite the fact that the GDPR recognizes that automated decision-making, including profiling, can have serious (legal) consequences for individuals, it does not define what ‘legal’ or ‘similarly significant’ is. So, we have to fall back on what common interpretations (in the world of legal) and definitions are.
For instance, a legal effect affects someone’s legal rights (e.g., the freedom to associate with others or vote in an election) or affects a person’s legal status (e.g., denial of a particular social benefit because that person has the “wrong profile”). But even if legal effects are not applicable, it might still be that the decision-making or profiling produces an effect that is similarly significant in its impact. Recital 71 states a few examples: “automatic refusal of an online credit application” or “e-recruiting practices without any human intervention”.
As indicated earlier, there are exceptions in which case Article 22(1) does not apply. The three exceptions are defined in Article 22(2)(a-c). Of those three, Article 22(2)(c) might be of interest, as it states that 22(1) does not apply in case the data subject has given explicit consent.
“Hum, what does that mean…?”
Again, let’s break it down and look at ‘consent’ first. GDPR Article 4(11) defines consent as “any freely given, specific, informed and unambiguous indication of the data subject's wishes by which he or she, by a statement or by a clear affirmative action, signifies agreement to the processing of personal data relating to him or her.” Again a mouth full, but consent was already defined in the previous EU Directive 95/46/EC and as such, not new. The GDPR builds/extends on this definition (e.g., more specific on withdrawal).
If we break down Article 4(11), there is a lot to say about the individual words, like ‘freely given’, ‘specific’, and ‘informed’. As a matter of fact, the Article 29 WP has written an extensive guideline on just consent (which I highly recommend if you want to read more about the interpretation and thinking behind the GDPR principle of ‘consent’).
Let’s dig a little deeper in ‘freely given’, ‘specific’ and ‘informed’ in relation to AI systems.
Transparency is one of the fundamental GDPR principles and plays an important role for consent. In the sense that when a data subject gives consent to the data controller regarding the use and processing of their data, the decision to do so is (i) at free will, (ii) for a specific purpose, and (iii) informed.
(i) ‘Freely given’ implies a real choice and control for data subjects. There is no ‘imbalance of power’ between the data controller and the data subject, which would ‘force’ the data subject to consent, afraid of possible consequences or ‘missing out’ if consent is not given.
(ii) ’Specific’, as the word implies, requires that the data subject knows exactly for which purpose their data is being used. And as such, also has a choice in case of multiple purposes (e.g., “It is ok to use my data to do anonymized research for cancer, but not ok to use my data for marketing purposes”).
(iii) ‘Informed’, again as the name implies, means that the data subject understands why, how, and for what purpose their data is used.
You can see that these are (heavily) intertwined with each other.
So, where is potential rub?
I believe that for (some) AI systems, it might be very hard for a data subject to have a good understanding for which purpose their data is being used. Let alone that they can make an informed decision on whether to give consent or not. This will heavily depend on the complexity, transparency and explainability of the system in question.
This may be particularly challenging when we talk about derived and inferred data for which (explicit) consent might not have been given, simply because the data subject has no idea that this data even existed. On the other hand, the AI system developer may not always know upfront which data is going to be needed to get results or which inferred data is coming out of the system. This might lead to a catch-22 situation: data subjects will not give consent because they don’t have ‘the full picture’ and as such, are ‘not informed’. While the developer may not or cannot proceed as he/she either doesn’t have the full picture (yet), as that depends on the system.
The bottom line here is: can the developer explain their AI system to the data subject in lay-people terms so that the data subject has a clear and unambiguous understanding in order to make an informed decision to give (explicit) consent? Examples of explanation are: what is the purpose of the system, how does it work, which data is going to be used, how and why, what type of results can be expected from the system, how are these results being used, in particular related to the data subject?
I mentioned earlier that ‘explicit consent by the data subject’ is considered an exception to Article 22(1). Explicit here refers (i) to the fact that the data controller should undertake extra efforts to obtain this consent from the data subject and (ii) to the way consent is expressed by the data subject. The point is that even if the developer of the AI system has ‘explicit consent’ figured out (e.g., in accordance with Recital 42), this is of not much use if the previous described elements of consent are not being taken care of first…
If you are still with me, then I hope that I have given some helpful insight (at least from my perspective) on Article 22. There are other interesting GDPR Articles and Principles, which should be of interest to AI system developers. Examples are: ‘Privacy by Design’, 'The Right to Access’ (Articles 12, 15 and 46), and 'The Right to erasure [right to be forgotten]’ (Article 17). Those might be a topic for another blog post.
I invite you to comment and/or share your ideas/thoughts on all of this!