Thou shall not explain AI models without caution Thou shall not explain AI models without caution
In many domains, it is increasingly critical to be able to explain the decisions taken by AI models. Explanations might be necessary in order to build trust in algorithms and drive high-stakes decisions. Explanations are often a legal requirement: the right to an explanation is part of the EU General Data Protection Regulation. While the need to explain AI is pressing, there are many concerns regarding the current methodological approaches that tackle this problem.
Is it a black-box model?
Everyone seems to agree that a proprietary model whose computing logic is not accessible to the user is a black box. However, there is no consensus on the categorization of AI models that have a transparent source code but are too complex for a human to go through their decision-making process in reasonable time. Typical examples of such models are random forests and deep neural networks. Some refer to these models also as black-box and some do not. For the rest of the discussion here, we will categorize them separately and call such models as “non-simulatable”. We will leave the term “black box” only for models of proprietary nature.
But what about decision trees or rule-based models? Well, their categorization depends on their size: If a model is fully transparent and allows for a human to follow its computation in reasonable time, we will call it a “simulatable” model. Decision trees or rule-based models of a relatively small size fall into this category. However, unwieldy rule lists or very large decision tree models will fall into the non-simulatable category.
How did we get here?
Black-box and non-simulatable models have achieved strong performance in many domains, especially for image recognition tasks and natural language processing. Behind their success, among others, there is their ability to identify patterns in data with limited feature engineering and utilization of “off-the-shelf” libraries. They have been the models of choice in many academic and commercial use cases. But how can we explain the predictions or decisions made by such models?
When an explanation is required, it is common practice to apply methodologies that analyze AI models post hoc, i.e. after the model has been trained (Figure 1). The goal is to address the lack of transparency of black-box and non-simulatable models and help:
- understand and validate the behavior of the model
- identify edge cases and anticipate potential model failures
- gain the trust of end users and internal stakeholders.
One model, several explanations
Deployed AI models in the industry typically have multiple stakeholders: developers, domain experts, regulatory entities, management and end users that could be ultimately affected by the outcome of the models.
Local vs global explanations
Explanations can be local or global. Local explanations explain one single decision or prediction. Global explanations provide insights into the behavior of the model on the overall dataset.
Developers and product owners are generally interested in the global behavior of the model. Users are generally interested in understanding the decision made on their specific case. A local explanation typically differs from a global one due to inherent averaging effects present in the global explanation.