The fifty shades of black in a black box AI system

Recently, I get a lot of questions about (i) what a so-called "black-box" AI system is and which challenges this poses. And in particular related to trust in AI systems, which leads to the question whether an AI system can be a black box and still be trusted?

In this blog post, I share my insights regarding the black box question. I will discuss the other question ("Can a black box AI system be trusted?") in my next blog post.

Ok, let's get started!

Let me start by saying that I find it very encouraging that in the AI world, researchers are stepping up to the plate and try to 'open up' the black box (e.g., Local Interpretable Model-agnostic Explanations [LIME] and Generalized Additive Models with Pairwise Interactions [GA2Ms]). In reality, there are a lot of (technical) details to this, which I will not discuss in this blog post. Instead, I make the claim that a black box can have different shades of black, depending on (i) the eye of beholder and (ii) whether we talk ex-ante or ex-post.

"So, what is a black-box AI system?", you might ask. Well, as the name implies, it is a system, which takes input (e.g., data), does 'its magic', and outputs 'something' (e.g., data, action). Typically, the black in black box refers to the 'magic' part. However, it is certainly possible that the input is also shielded. As such, for the observer, only the outcome is visible. So, the observer sees the outcome, which may or may not be useful. But they have no idea how the system derived to that outcome.

"OK, but is this a problem? " Good question as it all depends... If you have seen our Trust-in-AI framework then you may recall the Context layer. It is this layer, which consists of elements like ethics, regulation, compliance, business, and industry, which play an important role to determine whether a black box AI system is a problem or not.

For instance, regulation might force you to have insights in the AI system you use/develop. In my GDPR blog post, I discuss this in detail. Maybe you simply want to do the 'right thing' from an ethical point of view, because you want your AI system to be transparent. Or from a business perspective, you might say "I don't really care in this stage. If my AI system causes something bad to happen, I will cross that bridge when I get there."

That's one reason why in reality it is not a binary decision and why we should talk about "shades of black" instead.

Another reason is that the "shade of black" depends on the eye of the beholder. For instance, when you watch Netflix/AmazonVideo/Hulu, you probably get movie/TV show recommendations (depending on your settings). These recommendations are based on your viewing history and preferences. You as an observer see these recommendations, but may have no idea how they came up with those recommendations. As such, for you this system is a black box. Maybe not pitch black, as you may get the gist of it (e.g., if you compare the system's recommendations, it may indicate that you like action movies). But you don't have insights in all the details, like which data is used in the recommender system, which AI model/algorithm is used, and so on. On the flip side: for the developers of the system, the system may not be a black box at all. As they can explain exactly how it all works, which data is used, and how the system derives to its recommendations.

But we can take this a step further. Not only can we have different shades of black between observers. We can also have different shades for each observer, which depends on the actual model used. For this, we need to distinguish between ex-ante and ex-post explanations:

  • Ex-ante: we can explain how the system is designed and how it works prior to the actual use of the model.
  • Ex-post: we can explain how the model has worked, including it outcomes, after it has been put to use.

In case of a black box, both or either one of the explanations has a shade. At least in the eye of the beholder (observer).

Let's go back to our Recommender System (RS) example. And for the sake of the discussion, let's define four shades of black: (NB) No Black (so essentially full transparency), (LB) Light Black, (MB) Medium Black, and (PB) Pitch Black. And let's assume we only have two observers: (EU) the end user (people like you and me) and (SD) the system developer(s).

A possible combination of all these elements is:

  • EU:
    • Ex-ante RS = PB in case we (safely) assume that a typical end user has no idea how the recommender system is built.
    • Ex-post RS = NB, LB, MB, or PB, depending whether the end user can figure out how the system made its recommendations.
  • SD:
    • Ex-ante RS = NB in case we (safely) assume that the developers know how the systems works, as they have designed it.
    • Ex-post RS = NB, LB, MB, or PB, depending on the actual model/algorithm, which is used for the RS.

So, in this case, we have different shades between ex-ante and (possibly) ex-post of the observers, as well as for each  observer.

Let me explain.

You might wonder how it is possible for a developer to have, say, "SD ex-post RS = MB" and not just "SD ex-post RS = NB"? As I already indicated, this is related to the actual model/algorithm used. For instance, in case the developer uses a Neural Network (NN), there is a point where even the developer cannot explain certain details of the NN anymore after it has been trained. Ex-ante can still be 'NB', as the developer can explain the NN in terms of: hyper parameters used (e.g., number of input/hidden/output layers, optimizer used, learning rate, regularization), training/validation/test sets, and so on.

But one of the key characteristics of a NN is that - given the hyper parameters (which are ex ante) - the NN 'finds' (calculates) its set of 'optimal' weights. However, although the developers know ex-ante how the NN works, they don't know (exactly) why the weight values are exactly those values. The reason is the huge number of combinations of layers, nodes, weights and data, which need to be 'crunched' (calculated) by the model. Especially in case of large models (which is the case for most, if not all, recommender systems), these calculations are simply not feasible for humans.

On the other hand, if the developers would have used, say, a Random Forest as model, they can explain the ex-post much better, to the point that "SD ex-post RS = NB". Granted, the number of decision trees, which makes up the Random Forest, might be large, but is still manageable for humans. In particular, when looking at, for example, indicators like 'feature importance', which shows which features are important for which decision.

Another question on your mind might be: 'How is it possible for an end user to have, say, "EU ex-post RS = NB, LB, MB, or PB" and not just "EU ex-post RS = PB"?' If it was 'NB' then essentially the end user has 'reverse engineered' the RS. At least its recommendations, not necessarily the actual model used as the model might still be a black box. In reality, given the complexity of most RS, this is highly unlikely, but theoretically possible.

If it was 'LB', then the end user has a decent understanding how the RS makes its recommendations, and in case of 'MB', they have some understanding. In case of 'PB', they have no idea/understanding at all. In reality, for most end users it is 'PB', with 'MB' for some end users.

It would be interesting to do this 'test' yourself and see which shade applies to you!

The next important question is: "Can an AI system be trusted despite having shades of black? Because aren't explainability and transparency be requirements to build trust?". The answer might surprise you, as I will explain in my next blog post.

Thanks for reading.

Until then, enjoy AI and I hope to 'see' you in my next blog post!