• Mattia Ferrini, Director |

In many domains, it is increasingly critical to be able to explain the decisions taken by AI models. Explanations might be necessary in order to build trust in algorithms and drive high-stakes decisions. Explanations are often a legal requirement: the right to an explanation is part of the EU General Data Protection Regulation. While the need to explain AI is pressing, there are many concerns regarding the current methodological approaches that tackle this problem.

Is it a black-box model?

Everyone seems to agree that a proprietary model whose computing logic is not accessible to the user is a black box. However, there is no consensus on the categorization of AI models that have a transparent source code but are too complex for a human to go through their decision-making process in reasonable time. Typical examples of such models are random forests and deep neural networks. Some refer to these models also as black-box and some do not. For the rest of the discussion here, we will categorize them separately and call such models as “non-simulatable”. We will leave the term “black box” only for models of proprietary nature.

But what about decision trees or rule-based models? Well, their categorization depends on their size: If a model is fully transparent and allows for a human to follow its computation in reasonable time, we will call it a “simulatable” model. Decision trees or rule-based models of a relatively small size fall into this category. However, unwieldy rule lists or very large decision tree models will fall into the non-simulatable category.

How did we get here?

Black-box and non-simulatable models have achieved strong performance in many domains, especially for image recognition tasks and natural language processing. Behind their success, among others, there is their ability to identify patterns in data with limited feature engineering and utilization of “off-the-shelf” libraries. They have been the models of choice in many academic and commercial use cases. But how can we explain the predictions or decisions made by such models?

When an explanation is required, it is common practice to apply methodologies that analyze AI models post hoc, i.e. after the model has been trained (Figure 1). The goal is to address the lack of transparency of black-box and non-simulatable models and help:
 

  • understand and validate the behavior of the model
  • identify edge cases and anticipate potential model failures
  • gain the trust of end users and internal stakeholders.

One model, several explanations

Deployed AI models in the industry typically have multiple stakeholders: developers, domain experts, regulatory entities, management and end users that could be ultimately affected by the outcome of the models.

Local vs global explanations

Explanations can be local or global. Local explanations explain one single decision or prediction. Global explanations provide insights into the behavior of the model on the overall dataset.

Developers and product owners are generally interested in the global behavior of the model. Users are generally interested in understanding the decision made on their specific case. A local explanation typically differs from a global one due to inherent averaging effects present in the global explanation.

The dangers of post-hoc local explanations

Black-box and non-simulatable models often require local explanations and these can come in many different forms. For instance, saliency maps are often used to explain image classification results.

Methodologies such as SHAP and LIME generate simple models (surrogates) that offer explanations for a single data point that corresponds to a specific decision or prediction. Local surrogates are inherently interpretable models: the output of surrogates provides insights that can be understood and interpreted by a human being with an understanding of basic mathematics and simple logical deductions. However, we believe that the use of such local post-hoc explanation methods should be treated with extreme caution, especially when the use case requires strict guarantees that explanations are faithful. A local explanation method might in fact:
 

  • employ completely different features compared to the initial model for the very same prediction or decision.
  • result in a lack of robustness, as the explanations might be very different for very close data points.
  • produce little information as to understand how exactly the initial model made its computations.

Counterfactual explanations

Acknowledging the limitations of many post-hoc local explainability methods, the use of counterfactuals has been advocated in a broad variety of regulatory contexts. Counterfactual explanations shift the focus from “how this decision was made” to “what can be done to change this decision”. They provide to the users the smallest change they should perform in order to obtain the desirable outcome. And this turns out to resonate well with end users.
However, there are significant risks associated with the use of counterfactual explanations, including:
 

  • Suggested changes may not always be actionable.
  • Suggested changes may impact other dependent features that could in turn affect the decision in the non-desired direction.
  • Feature normalization is dataset dependent and does not always align with the users’ metrics
    of “cost of change”.
  • Counterfactuals could constitute an implicit recommendation in an industry where recommendations are legally prohibited.
  • Counterfactuals require that the underlying AI model remains unchanged.

Where do we go from here?

Explaining black-box and non-simulatable models is extremely challenging. Are there any other alternatives to explore?

Inherently interpretable models

Progress in inherently interpretable simulatable models has been steep: from Prototypic Neural Networks to Explainable Boosting Machines, there are currently many types of inherently interpretable models that achieve state-of-the-art performance on a wide range of problems. It is often believed that it is necessary to sacrifice the interpretability of an algorithm for accuracy and performance (aka the accuracy-interpretability trade-off). However, this is a common misconception in the AI industry and the existence of an accuracy-interpretability trade-off has not been proven.

Especially for datasets with meaningful features, modern simulatable models may even outperform popular types of non-simulatable models such as random forests. Apart from high accuracy, inherently interpretable simulatable offer:
 

  • absolute transparency: management, regulatory entities and users can fully understand the decision-making process
  • ease to troubleshoot, fix and improve: developers and domain experts can iteratively increase model quality before deployment

Related research articles

[1] Goodman, Bryce, and Seth Flaxman. "European Union regulations on algorithmic decision-
making and a “right to explanation”." AI magazine 38.3 (2017): 50-57.
[2] Slack, Dylan, et al. "Fooling lime and shap: Adversarial attacks on post hoc explanation methods." Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 2020
[3] Wachter, Sandra, Brent Mittelstadt, and Chris Russell. "Counterfactual explanations without opening the black box: Automated decisions and the GDPR." Harv. JL & Tech. 31 (2017): 841.
[4] Rudin, Cynthia. "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead." Nature Machine Intelligence 1.5 (2019): 206-215.
[5] Caruana, Rich, et al. "Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission." Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 2015.