Empowerment Week 2 Exercise: Explainability and Transparency¶

How to use this notebook

Simply read the text and follow the instructions.
This notebook contains code cells, which can be modified and must be executed to see the result of their content.
To execute a cell, select it and click on the play button (▶) in the tool bar, or type Shift + Enter or Ctr + Enter.

As the variables contained in a cell are stored in memory, the order of execution of the cells is important!

Notebook by Maxime Lelièvre, Mattéo Berthet, Cécile Hardebolle and the Responsible software team (2025).
Except where otherwise noted, the content of this notebook is licensed under a Creative Commons Attribution International License (CC BY 4.0 International).
Creative Commons License


Introduction¶

Would you trust an algorithm that has a 95% accuracy but without knowing its inner workings? Would you trust it more if its accuracy was 98%? With more and more Machine Learning models being defined as "black-box" models, transparency and explainability have become a growing area of research in Machine Learning known as Explainable AI (XAI).

In this exercise, we will first explore the current challenges of transparency and explainability in Machine Learning. Then we will apply tools of XAI to discover its benefits as well as its shortcomings.

Learning Goals

What will be covered :

  • Part 1 : Choose a ML model for a given application
  • Part 2 : Attempt to explain the decisions of different models
  • Part 3 : Review the limitations of XAI

By the end of the session you will be able to:

  • ✅ Identify intepretability issues with different types of Machine Learning models.
  • ✅ Analyse the importance of features in an interpretable model
  • ✅ Use a XAI tool called SHAP to obtain a feature importance graph for a deep learning model (global explanation)
  • ✅ Use SHAP to generate local explanations for individual decisions from a deep learning model
  • ✅ Explain the limitations of current XAI approches.
In [1]:
%load_ext autoreload
%autoreload 2

# import libraries
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import MinMaxScaler

import pickle

import shap

# delete warning
import warnings
warnings.filterwarnings('ignore')

from res.utils import *

import otter
test = otter.Notebook()

Part 1: Choosing the best ML model¶

You are working as a Machine Learning engineer/risk analyst in a big public bank. Your team is in charge of designing tools that help the bank take the best decision for loan applications. Following the exponential growth of AI, your team has been designing some Machine Learning models for a couple of years to classify loan applicants. After a lot of research and testing, the Machine Learning team found two promising models. Your manager is asking you to run a comprehensive evaluation to decide which model the bank should deploy in real life.

The bank provides you with a large dataset to run your experiment. The dataset consists of demographic features from more than 25 000 individuals and includes a label mentioning whether they obtained a loan from the bank or not.

Note: the original dataset used in this notebook is the US Census dataset that we have modified for the purpose of the story.

Let's first explore the dataset!

1.1 Load and explore the dataset¶

Instructions

Run the cell below to load the dataset and see its first rows.

Note: The labels are in the last column of the dataframe (Loan granted).

In [2]:
data = load_dataset_eda()
data.head()
Out[2]:
Age Workclass Education-Num Marital Status Occupation Relationship Race Sex Capital Gain Capital Loss Hours per week Loan granted
0 39.0 State-gov 13.0 Never-married Adm-clerical Not-in-family White Male 2174.0 0.0 40.0 False
1 50.0 Self-emp-not-inc 13.0 Married-civ-spouse Exec-managerial Husband White Male 0.0 0.0 13.0 False
2 38.0 Private 9.0 Divorced Handlers-cleaners Not-in-family White Male 0.0 0.0 40.0 False
3 53.0 Private 7.0 Married-civ-spouse Handlers-cleaners Husband Black Male 0.0 0.0 40.0 False
5 37.0 Private 14.0 Married-civ-spouse Exec-managerial Wife White Female 0.0 0.0 40.0 False

Here is a brief description of the columns in the dataset:

Feature Description Value Range / Categories
Age Age of the person in years. [17, 90]
Workclass Industry sector of employment. Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked
Education-Num Highest level of education achieved, represented by a numerical code
(/!\ It does not correspond to years of education)
[1, 16]
Marital Status Marital status. Married-civ-spouse (civilian spouse), Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse (Armed Forces spouse)
Occupation Category of occupation. Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces
Relationship Relationship status (somewhat redundant with marital status). Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried
Race Race category. White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black
Sex Biological sex. Female, Male
Capital Gain Capital gain in the previous year. [0.0, 99999.0]
Capital Loss Capital loss in the previous year. [0.0, 4356.0]
Hours per week Number of working hours per week. [1, 99]
Loan granted Loan obtained or not by the person (label that the model learns to predict). True, False

Instructions

Run the cell below to display graphs showing the distribution of the values in the dataset.

Note that we have not included all columns, feel free to explore by changing the code below.

In [3]:
categorical_features =  ['Loan granted', 'Relationship', 'Marital Status', 'Race', 'Sex',]
numerical_features = ['Age', 'Education-Num', 'Capital Gain', 'Capital Loss', 'Hours per week']

# plot countplots for some categorical features
fig, axs = plt.subplots(1, len(categorical_features), figsize = (16,5))
for i, col in enumerate(categorical_features):
    sns.countplot(data = data, x = col, ax = axs[i])
    axs[i].set_title(col)
    axs[i].tick_params(axis = 'x', rotation = 90)
plt.tight_layout()
plt.show()

# plot histograms for numerical features
fig, axs = plt.subplots(1, len(numerical_features), figsize = (16,4))
for i, col in enumerate(numerical_features):
    sns.histplot(data[col], ax = axs[i], kde = True)
    axs[i].set_title(col)
plt.tight_layout()
plt.show()
No description has been provided for this image
No description has been provided for this image

Reflection time!

Analyze the graphs above to answer the following questions.

Are loans most frequently granted or refused?

In the categorical features, what is the most frequent:

  • Relationship category?
  • Marital status?
  • Race?
  • Sex?

In the numerical features, what is the most frequent value for:

  • Age?
  • Education level?
  • Capital gained in the previous year?
  • Capital lost in the previous year?

Feedback - Click on the "..." below only once you have really tried to answer the question!

  • Loans are almost 4 times more often refused than granted.
  • The most frequent relationship category is "Husband".
  • The most frequent marital status is "Married-civ-spouse".
  • The most frequent "race" is "White".
  • The most frequent sex is "Male".
  • The most frequent age is around 36 years old.
  • The most frequent education level is 9 (corresponding to a category).
  • The most frequent capital gained in the previous year is 0.
  • The most frequent capital lost in the previous year is 0.

1.2 Train two classification models¶

Your goal is to predict wether the applicant will be able to pay back the loan or not. You have all features and the label that your model should predict which is Loan granted which can either be False or True. If your model predicts accurately the label from the dataset, then the bank could use it on new applicants.

The two promising models that your team has identified are:

  • a Logistic Regression classifier (LR): this is a statistical model based on a mathematical function called the logistic function (to refresh your memory you can review quickly the introduction to LR we have seen in the Fairness 2 module).
  • a Multi-Layer Perceptron (MLP) with ReLU activation: this is a deep learning model that is composed of several layers of artificial neural networks, which cannot be easily represented by an equivalent mathematical function.

Both models have different complexity and interpretability capacities.

Note

Some definitions...

  • Complexity: refers to the degree of intricacy or sophistication of a model, often determined by the number of parameters, layers, and operations it includes.
  • Interpretability: refers to the ease with which humans can understand and make sense of the behavior, outputs, and internal mechanisms of a system or model, often through visualizations or intuitive explanations.

Reflection time !

Based on your intuition at this point, which sentence do you think is correct?

  • A logistic regression model has a lower complexity than a neural network and is usually less interpretable.
  • A logistic regression model has a lower complexity than a neural network but is usually more interpretable.
  • A logistic regression model has a higher complexity than a neural network but is usually less interpretable.
  • A logistic regression model has a higher complexity than a neural network and is usually more interpretable.

Feedback - Click on the "..." below only once you have really tried to answer the question!

A logistic regression model has a lower complexity than a neural network but is usually more interpretable.

Logistic Regression is a highly interpretable model because it can be represented by a mathematical function whose coefficients directly indicate the contribution of each feature to the predicted probability, making it easy to explain. In contrast, a Multi-Layer Perceptron (MLP) with ReLU activation is a nonlinear model with multiple hidden layers, whose learned weights and activations are harder to interpret due to their complexity and lack of a straightforward mapping to input features. While MLPs can capture more complex relationships in the data, they trade off interpretability for predictive power.

Instructions

Run the cell below to pre-process the data and create separate datasets totrain and test your LR and MLP models:

  • The columns Workclass, Education-Num, Marital Status, Occupation, Relationship, Capital Gain, Capital Loss, Hours per week are the features available for each loan applicant.
  • The Loan granted column is the label to predict.
In [4]:
# load dataset
df = load_dataset_model()
df.head()

# define features and label
features = ['Workclass', 'Education-Num', 'Marital Status', 'Occupation', 'Relationship', 'Capital Gain', 'Capital Loss', 'Hours per week']
label = 'Loan granted'

# separate features from label
X = df[features]
y = df[label].tolist()

# scale all features
scaler = MinMaxScaler()
X[features] = scaler.fit_transform(X[features])

# split into train and test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f'Length of train set: {len(X_train)} samples')
print(f'Length of test set: {len(X_test)} samples')
Length of train set: 23336 samples
Length of test set: 5834 samples

Note: we have made a number of simplifications in the way we create our models in this toy example that are not very realistic / representative of the state of the art.

  • We treat all features as numerical features for the sake of simplicity with the interpretability methods we are going to use, but the categorical features should actually be processed differently (e.g. they should be "dummified" as we have seen in other notebooks).
  • We have intentionaly chosen not to keep the Race, Sex and Age columns for our features to reduce bias, but you know from previous notebooks that this is not sufficient and we should actually control for proxy attributes and check the balance of the label for different groups in our dataset.
  • We have scaled all our features to the same value interval to make sure we do not bias their relative importance. However, depending on the data, different scaling strategies (e.g. standardization) lead to different performance with different types of models, which we have not explored here.

Instructions

Run the cell below to define and train our logistic regression model defined as lr_model.

In [5]:
# Logistic regression model, defined as "lr_model"
lr_model = LogisticRegression()
lr_model.fit(X_train, y_train)
Out[5]:
LogisticRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
penalty  'l2'
dual  False
tol  0.0001
C  1.0
fit_intercept  True
intercept_scaling  1
class_weight  None
random_state  None
solver  'lbfgs'
max_iter  100
multi_class  'deprecated'
verbose  0
warm_start  False
n_jobs  None
l1_ratio  None

Reflection time !

Regarding the 2 code cells above, which affirmation is not correct?

  • The dataset has been split into two sets: one for training and one for testing.
  • The model has been initialized and trained on the whole dataset.
  • The model has been initialized and trained on the train dataset.
  • The features and the label have been first separated from the original dataset.

Feedback - Click on the "..." below only once you have really tried to answer the question!

The model has been initialized and trained on the whole dataset.

Instructions

Run the cells below to create the neural network model, which is done in 3 steps (in 3 different cells):

  1. definition of the model (layers etc.)
  2. training
  3. and loading.

You don't need to understand the details of the neural network model. We put it here for your curiosity.

Note: We have already trained the neural network model for you.
You will find the training code below for your curiosity but please don't run the cell as training such a model requires relatively heavy computation, re-doing it would be unecessary and would have avoidable environmental impacts.

In [6]:
import torch
import torch.nn as nn

# Define the neural network
class SmallMLP(nn.Module):
    def __init__(self):
        super(SmallMLP, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(8, 16),
            nn.ReLU(),
            nn.Linear(16, 256),
            nn.ReLU(),
            nn.Linear(256, 16),
            nn.ReLU(),
            nn.Linear(16, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.model(x)
    
    def predict_proba(self, x):
        # take a pandas dataframe as input and return raw predictions between 0 and 1
        x = torch.tensor(x, dtype=torch.float32)
        y = self.forward(x).detach().numpy()
        return y
    
    def predict_proba_row(self, x):
        # take a pandas dataframe as input and return raw predictions between 0 and 1
        x = torch.tensor(x.values, dtype=torch.float32)
        y = self.forward(x).squeeze().detach().numpy()
        return y
    
    def predict(self, x):
        # take a pandas dataframe as input and return binary predictions (True/False)
        x = x.values
        y = self.predict_proba(x)
        return y > 0.5
In [7]:
# import torch.optim as optim
# from sklearn.model_selection import train_test_split
# from sklearn.preprocessing import StandardScaler
# from torch.utils.data import DataLoader, TensorDataset, WeightedRandomSampler


# device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")

# # Convert data to tensors
# X_train_tensor = torch.tensor(X.values, dtype=torch.float32)
# y_train_tensor = torch.tensor(y, dtype=torch.float32)

# # Split the data into train and validation sets
# X_train, X_val, y_train, y_val = train_test_split(X_train_tensor, y_train_tensor, test_size=0.2, random_state=42)

# # Move data to the selected device
# X_train, X_val = X_train.to(device), X_val.to(device)
# y_train, y_val = y_train.to(device), y_val.to(device)

# # Create a DataLoader
# train_dataset = TensorDataset(X_train, y_train)
# train_loader = DataLoader(train_dataset, batch_size=32)

# # Instantiate the model
# model = SmallMLP().to(device)  # Move the model to the selected device

# # Define loss and optimizer
# criterion = nn.BCELoss()
# optimizer = optim.Adam(model.parameters(), lr=0.00005)

# # Training loop
# num_epochs = 100

# best_val_loss = float('inf')

# for epoch in range(num_epochs):
#     model.train()
#     total_loss = 0

#     for batch_X, batch_y in train_loader:
#         optimizer.zero_grad()
        
#         outputs = model(batch_X).squeeze()
#         loss = criterion(outputs, batch_y)
        
#         loss.backward()
#         optimizer.step()
        
#         total_loss += loss.item()

#     # Validation
#     model.eval()
#     with torch.no_grad():
#         val_outputs = model(X_val).squeeze()
#         val_loss = criterion(val_outputs, y_val)
#         val_preds = (val_outputs > 0.5).float()
#         accuracy = (val_preds == y_val).float().mean()
#         if val_loss < best_val_loss:
#             best_val_loss = val_loss
#             torch.save(model.state_dict(), 'small_mlp_dict.pth')
#             print(f"Model saved at epoch {epoch+1}")
    
#     print(f"Epoch {epoch+1}/{num_epochs}, Loss: {total_loss / len(train_loader):.4f}, Val Loss: {val_loss.item():.4f}, Val Accuracy: {accuracy.item():.4f}")
In [8]:
# Load the model
mlp_model = SmallMLP()
mlp_model.load_state_dict(torch.load('models/small_mlp.pth', map_location=torch.device('cpu')))
mlp_model.eval()
Out[8]:
SmallMLP(
  (model): Sequential(
    (0): Linear(in_features=8, out_features=16, bias=True)
    (1): ReLU()
    (2): Linear(in_features=16, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=16, bias=True)
    (5): ReLU()
    (6): Linear(in_features=16, out_features=1, bias=True)
    (7): Sigmoid()
  )
)

1.3 Which has the best accuracy?¶

Now that both models are trained, we can evaluate their performance on the test set. Remember that the features are in X_test, and the ground truth is in y_test.

Instructions

The cell below generates the predictions from the models for the test set and stores the results in y_lr and y_mlp.

Complete the code to calculate the accuracy for both models.
We suggest that you use the method accuracy_score from sklearn.metrics:

  • It takes two arguments: an array with the "ground truth" (y_test in our case) and an array with the predictions from the model.
  • It returns the accuracy as a value between 0 and 1.
In [9]:
from sklearn.metrics import accuracy_score

# Predictions
y_lr = lr_model.predict(X_test)
y_mlp = mlp_model.predict(X_test)

# Compute accuracy
### YOUR CODE HERE ###
lr_acc = accuracy_score(y_test, y_lr) # SOLUTION
mlp_acc = accuracy_score(y_test, y_mlp) # SOLUTION
#####################

print(f"Logistic regression model accuracy: {lr_acc*100:.2f}%")
print(f"Neural network model accuracy: {mlp_acc*100:.2f}%")
Logistic regression model accuracy: 84.06%
Neural network model accuracy: 84.33%
In [10]:
test.check("accuracy")
Out[10]:

accuracy
passed! 🚀

Reflection time !

Which model would you choose to deploy for the bank? Choose the option that best describes your reasoning.

  • The MLP model because its accuracy is higher.
  • The LR model because its accuracy is lower.
  • The MLP model because a complex model will always perform better in reality than a less complex one.
  • The LR model because it will be easier to undertand how it takes decisions, even though it is not the most accurate.
  • None of them. I do not have enough information to decide.

Feedback - Click on the "..." below only once you have really tried to answer the question!

None of them. I do not have enough information to decide.

Here are some reasons:

  • Interpretability vs. Complexity: In banking, transparency is important, so a simpler model might be preferable despite a slightly lower accuracy.
  • Fairness and Bias: The model must be evaluated for potential biases to ensure fairness in decision-making.
  • Real-world Performance: A complex model may overfit, leading to poor generalization in practice.

1.4 Is accuracy enough?¶

As an ethically trained engineer, you know that accuracy is not enough to evaluate the quality of a model.
In particular, you remember from your Responsible Software course that you should always look at the different types of errors hidden behind the accuracy measure, in particular the False Positives and the False Negatives!

Instructions

Complete the code below to compute the confusion matrix for each model based on our test set.

We suggest that you use the method confusion_matrix from sklearn.metrics:

  • It takes two arguments: an array with the "ground truth" (y_test in our case) and an array with the predictions from the model.
  • It returns a 2-dimension numpy array of the form: [[tn, fp], [fn, tp]].
In [11]:
from sklearn.metrics import confusion_matrix

### YOUR CODE HERE ###
cm_lr = confusion_matrix(y_test, y_lr) # SOLUTION
cm_mlp = confusion_matrix(y_test, y_mlp) # SOLUTION
######################

display(cm_lr)
display(cm_mlp)
array([[4128,  275],
       [ 655,  776]])
array([[4095,  308],
       [ 606,  825]])
In [12]:
test.check("confusion_matrix")
Out[12]:

confusion_matrix
passed! 🌟

Instructions

Run the cell below to plot the confusion matrices.

In [13]:
plot_confusion_matrices(cm_lr, cm_mlp)
No description has been provided for this image

Reflection time !

Which model would you choose to deploy for the bank? Choose the option that best describes your reasoning.

  • The MLP model because it has less False Negatives, namely it will reject less loan applications that would have certainly been paid back and thus miss fewer opportunities for people who want to take a loan.
  • The LR model because it has less False Positive, namely it will accept less loan applications that would not have paid back the bank and thus reduce the risks for the bank.
  • None of them.

Feedback - Click on the "..." below only once you have really tried to answer the question!

In our scenario, we have the following consequences for errors:

  • The false positives mean a risk of loosing money for the bank as they represent people who get a loan but will not pay it back.
  • The false negatives mean a lost opportunity to gain money for the bank as they represent people who do not get a loan even though they would actually pay it back.

While the MLP model has a lower performance in false positives (308 - 275 = 33 more people not paying back the loan compared to the LR model), that could be compensated by the higher performance on false negatives (655 - 606 = 49 more customers compared to the LR models). However the difference is not very large, and differences in the amounts of the loans could offset it.
So, all in all, it would depend on the risk policy of the bank.

Part 2: Explaining model decisions¶

You proudly present your results to the managerial board of the bank, and recommend them to adopt the MLP model on the basis of your analysis above.
However the CEO asks you: "That's very nice, but on which basis do your models make their decisions? Our bank has very strong ethical values and we are accountable to our clients, we must be able to explain how we decide to attribute loans or not.".

So you go back to your lab and you work on identifying how your models make decisions. You start first with the Logistic Regression model.

2.1 Feature Importance Analysis¶

The goal of Feature Important Analysis is to determine the relative importance of each feature in the decisions made by a machine learning model, as represented by a feature importance score. It provides valuable insights into the inner workings of the model and it is particularly useful for interpretability, as it allows us to explain the model's decisions to stakeholders and domain experts, and it can also be used for feature selection and engineering, helping to improve the model performance.

The way in which feature importance scores are determined depends on the type of model.
For Logistic Regression it is very simple to obtain as the importance of each feature is represented almost directly by the coefficients of the model.

Instructions

Run the cell below to see the feature importance analysis.

Note: we are displaying here the absolute value of the coefficients to be able to compare with the MLP model later.

In [14]:
plot_feature_importance_analysis(lr_model, X_train)
No description has been provided for this image

Reflection time !

Which feactures play the most important role in the predictions from the model?

Feedback - Click on the "..." below only once you have really tried to answer the question!

We can see that the model relies heavily on the capital gain and the education level to make its predictions, which seems to make sense from a causal point of view. However, features like occupation or workclass to not seem to play a role whereas relationship and marital status do, which is maybe more surprising. This is a good example of how feature importance analysis can help us understand on which type of patterns the model relies for prediction and identify potential biases or issues in the data.

Reflection time !

What could be the limitations of a feature importance analysis?

Feedback - Click on the "..." below only once you have really tried to answer the question!

While feature importance provides valuable insights into model behavior, it has limitations. These include:

  1. Univariate Analysis: Feature importance scores are calculated individually for each feature, ignoring potential interactions between features.
  2. Model-Specific: Feature importance is specific to the model used, and different models may assign different importance scores to the same features.
  3. Non-linear Relationships: Feature importance may not capture non-linear relationships between features and the target variable.

2.2 Explainable AI (XAI) with SHAP¶

Now you would like to do the same with the MLP model. Unfortunately it is not as simple as with the LR model since the coefficients of the MLP model actually do not represent the importance of the different features (in addition there are way too many coefficients to allow any interpretation). This is a characteristic of neural network models. For this type of model you need to use a post-hoc interpretability method to be able to have information on how it makes decisions.

There are several ways of doing this, but one well-recognized approach is called SHAP (SHapley Additive exPlanations). SHAP is a method that can be applied to all machine learning models, independently from their type. Described in an original paper in 2017, it is a method based on game theory concepts, particularly Shapley values, which assign each feature an importance score by considering its contribution to the difference between the model's predictions and a baseline prediction. This approach helps to evaluate the impact of individual features on model predictions, aiding in model debugging, feature engineering, and model selection.

2.2.1 Generating the SHAP values for our model¶

Overall, SHAP treats our model as a black box and uses it to make predictions using some input data, that it modifies to evaluate the importance of the different features and calculate the Shapley values.
The SHAP library we're using allows us to create an object called an "explainer", that we can use to obtain so called "explanations" for decisions from our MLP model.
In this notebook we use the explainer KernelExplainer().

Note

We have already generated the SHAP values for you.
Below is the code for your curiosity but please don't run the cell below as its requires relatively heavy computation, re-doing it would be unecessary and would have avoidable environmental impacts.

In [15]:
# explainer = shap.KernelExplainer(mlp_model.predict_proba, shap.kmeans(X_train, 100) )

# shap_values = explainer(X_test)

# with open('kernel_shap_values.pkl', 'wb') as f:
#     pickle.dump(shap_values, f)
    
# print('SHAP values saved')

# shap_values.values = shap_values.values.squeeze()

# shap.plots.bar(shap_values)

2.2.2 Global explanation: feature importance¶

Now that we have the SHAP values, we can evaluate the importance of the different features in the decisions made by our model.

Instructions

Run the cell below to compute display the feature importance for our MLP model.

In [16]:
# load the explainer
explainer = shap.KernelExplainer(mlp_model.predict_proba, shap.kmeans(X_train, 100))

# load the precomputed SHAP values
with open('res/kernel_shap_values.pkl', 'rb') as f:
	shap_values = pickle.load(f)
print('SHAP values loaded')

# display
shap_values.values = shap_values.values.squeeze()
shap.plots.bar(shap_values)
SHAP values loaded
No description has been provided for this image

How to understand this plot?

The plot above is a global bar plot. Here's a quick explanation of how to read it:

The plot has 3 main components:

  • The Y-axis: represents individual features of the dataset (typically the columns of the dataframe passed as an argument).
  • The X-axis: the mean absolute SHAP values. The SHAP values quantify the overall impact of each feature on the model's predictions across the entire dataset.
  • The horizontal bars: represent each feature's average impact on the model's prediction. The length of the bar indicates the magnitude of the feature's impact, with longer bars signifying more influential features.

Reading the plot:

  1. Feature Importance: The features are ordered from top to bottom based on their mean absolute SHAP value, with the most impactful feature at the top. This helps to quickly identify which features are most significant in the model.
  2. Magnitude of Impact: Each bar's length on the X-axis reflects how much each feature contributes, on average, to the model's predictions. For instance, if the bar for "Marital Status" extends further than the bar for "Occupation", it means that "Marital Status" has a greater overall influence on the model's predictions than "Occupation."

This is called a global explanation, as it provides an overview of the model's behavior across the entire test dataset.

Reflection time !

According to this global explanation, which features play the most important role in the predictions from the MLP model? Are the most important features the same for the MLP and the LR models?

Feedback - Click on the "..." below only once you have really tried to answer the question!

Based on this global explanation, we can identify differences and similarities in the importance the two models give to the different features:

  • the most important feature for the MLP model is relationship, which is much more surprising than the capital gain feature for the LR model
  • both models use the education level as their second most important feature
  • the MLP model puts way more importance on the marital status than the LR model, for which it was only the 6th feature in terms of importance
  • both models do not attribute much importance to occupation and workclass

This variation in feature importance highlights how different models can reflect distinct data patterns, even when trained on the same dataset.

2.2.3 Local explanation: focusing on one decision¶

The CEO of the bank is quite happy with the global explanation you have provided. But he wants to ensure that the bank is also able to explain individual decisions as this would be important for the clients.

You can also get SHAP values for single lines in the test dataset, so you show the CEO that your SHAP explanations include:

  • The prediction provided by the MLP model: it is a probability between 0 and 1 that the customer will repay the loan (1 meaning the loan will be repaid).
    You have defined a cutoff value at $0.5$ to decide when to attribute the loan or not.
  • The cumulative contribution of the different features to the prediction given by the model, represented by their SHAP values provided by the explainer.

Instructions

Run the cell below which will:

  • Display the features of the applicant with idenfier 363.
    Note: remember we have scaled all our features in the preprocessing stage, so here we have reverse-transformed the features of the applicant to obtain unscaled values that are interpretable.
  • Display the SHAP values explaining the prediction provided by the MLP model for this applicant.
In [17]:
# retrieve the data of applicant 363 and unscale them
applicant_363 = scaler.inverse_transform([shap_values[363].data]).round()
applicant_363_df = pd.DataFrame(applicant_363, columns=features, index=['363'])
display(applicant_363_df)

# plot the shap values
shap.plots.waterfall(shap_values[363], max_display=20)
Workclass Education-Num Marital Status Occupation Relationship Capital Gain Capital Loss Hours per week
363 6.0 13.0 2.0 12.0 4.0 0.0 0.0 50.0
No description has been provided for this image

How to understand this plot?

The plot above is a waterfall plot. Here's a quick explanation of how to read it:

The plot has 3 main components:

  • The Y-axis: represents individual features of the dataset (typically the columns of the dataframe passed as an argument).
  • The X-axis: the SHAP values. The SHAP values quantify the impact of each feature on the model's prediction.
  • The horizontal bars: represent each feature's SHAP value and its impact (positive or negative). The length of the bar indicates the magnitude of the feature's impact.

Reading the plot:

  1. Prediction: The value $f(x) = 0.537$ represented at the top of the graph is the output from our MLP model. If it is higher than the cutoff you have defined at 0.5, as is the case here, it means that the model predicts a loan should be granted to the applicant.

  2. Baseline Value: The plot starts at a baseline value, which is the average prediction for the entire dataset ($E[f(X)]$). This is the starting point on the X-axis. We can see on the plot that $E[f(X)] = 0.248$, which is below the cutoff of 0.5 and means that on average the model does not grant the loan.

  3. Feature Contribution: Each feature's contribution is added sequentially, showing how it increases or decreases the prediction from the baseline.
    Note: the value indicated beside each feature is the scaled value obtained after pre-processing the data, but you have the unscaled values displayed in the dataframe above (e.g. 0.8 for education-num corresponds to an unscaled value of 13.
    Example:

    • The workclass decreased the probability to have the loan granted by 5% for this specific applicant.
    • The education level increased the probability to have the loan granted by 16% for this specific applicant.

  4. Cumulative Effect: As you move up the plot, each bar adds to the cumulative prediction, illustrating the combined effect of all feature contributions.
    The bars that push the prediction to the right (positive impact) increase the likelihood of the positive class, while bars that push to the left (negative impact) decrease it.

The waterfall plot is a visualization for a single sample. It shows how the prediction for this specific instance is derived from the baseline by summing the contributions of each feature. For example, in this plot, the average sample in the test dataset is predicted to not be granted a loan (as we have seen that $E[f(X)] = 0.248 < 0.5$), but the chosen sample is predicted to be granted a loan since $f(x) = 0.537 > 0.5$.

This is called a local explanation, as it provides an explanation for a specific applicant of our dataset.

Reflection time !

Based on the plot above, select the only correct option:

  • For the given sample, the feature Workclass did not have an impact on the model's prediction.
  • For the given sample, the feature Education-Num had a positive impact on the model's prediction.
  • For the given sample, the feature Relationship had the most important impact in the model's prediction.
  • For the given sample, the features Capital Loss and Capital Gain had the same impact on the model's prediction.

Feedback - Click on the "..." below only once you have really tried to answer the question!

For the given sample, the feature Education-Num had a positive impact on the model's prediction.

All the others are incorrect:

  • For the given sample, the feature Workclass did not have an impact on the model's prediction.
    => The value of the Workclass feature made the cumulative SHAP value decrease by 0.05
  • For the given sample, the feature Relationship had the most important impact in the model's prediction.
    => The feature Education-num (SHAP value of 0.16) had the most impact in the model's prediction, before the feature Relasionship (SHAP value of 0.1).
  • For the given sample, the features Capital Loss and Capital Gain had the same impact on the model's prediction.
    => The feature Capital Loss made the cumulative SHAP value decrease by 0.01 whereas the feature Capital Gain made it decrease by 0.04.

2.3 [Optional] Spreading trust among users¶

Your CEO is happy! And he was right to ask you to do this since 2 weeks later a customer is coming at the bank to ask for explanations of why his application has been refused and what he could do for to maximize his chance for his next application. You are in charge of his case.

Your first step is to retrieve the information of this client and display the SHAP values for the decision given by the MLP model.

Instructions

Complete the cell below to display the information from the database for the customer with id 1113 and the SHAP values corresponding to the decision from the MLP model.

In [18]:
customer_id = 1113 # SOLUTION

# retrieve the data of applicant 1113 and unscale them
applicant_1113 = scaler.inverse_transform([shap_values[customer_id].data]).round()
applicant_1113_df = pd.DataFrame(applicant_1113, columns=features, index=[customer_id])
print(f'Data for customer {customer_id}:')
display(applicant_1113_df)

# plot the shap values
print(f'SHAP values for customer {customer_id}:')
shap.plots.waterfall(shap_values[customer_id], max_display=20)
Data for customer 1113:
Workclass Education-Num Marital Status Occupation Relationship Capital Gain Capital Loss Hours per week
1113 0.0 11.0 2.0 0.0 5.0 0.0 0.0 35.0
SHAP values for customer 1113:
No description has been provided for this image

With the plot above you have:

  • The output of the MLP model, which indicates that the loan has been refused for this customer as $f(x) = 0.381 < 0.5$
  • A SHAP explanation for this customer which indicates the contributions of the different features.

But to be able to justify the bank decision to the customer, you want to understand how a different situation for the customer would have changed the decision from the model.
One of your colleagues has designed a simulation where you can change the values of the different features and see how the output of the model would change (both the outcome and the explanation).

Instructions

In the cell below:

  • select a feature you want to change and put its name in the variable feature_to_change
  • select the new value for the feature and put it in the variable new_value

Then run the cell to see the result on the output of the model and the corresponding explanation.
Your goal is to determine at least 3 features to change with their new values so that the customer obtains their loan (i.e. so that $f(x)$ gets above 0.5).

You can then complete the variables first_solution, second_solution and third_solution 2 cells below to see if your solution is correct.

In [19]:
### PARAMETERS ###
feature_to_change = "Capital Gain" # SOLUTION
new_value = 2000 # SOLUTION
##################
simulation_customer(customer_id, mlp_model, shap_values, explainer, X_test, feature_to_change, new_value, scaler)
Previous data for customer 1113:
Workclass Education-Num Marital Status Occupation Relationship Capital Gain Capital Loss Hours per week
1113 0.0 11.0 2.0 0.0 5.0 0.0 0.0 35.0
Modified data for customer 1113:
Workclass Education-Num Marital Status Occupation Relationship Capital Gain Capital Loss Hours per week
1113 0.0 11.0 2.0 0.0 5.0 2000 0.0 35.0
New SHAP values for customer 1113:
  0%|          | 0/1 [00:00<?, ?it/s]
No description has been provided for this image
In [20]:
### Your results here ###
# solution = {"Feature name": value}
first_solution = {"Capital Gain": 1600.0} # SOLUTION
second_solution = {"Occupation": 3.0} # SOLUTION
third_solution = {"Education-Num": 13.0} # SOLUTION

Instructions

Run the cell below to test if your changes would have changed the prediction. If you get an error message, try to modify a typo in the feature name. Or be careful to the formating solution = {"Feature name": value}.

In [21]:
test_simulation(first_solution, second_solution, third_solution, customer_id, mlp_model, X_test, scaler)
The first solution is correct
The second solution is correct
The third solution is correct

2.4 Model debugging¶

While your CEO is happy, one of your colleague is actually worried: they are suspecting that the model has learned an incorrect pattern from the data.
They ask you to help debug the model. To start your analysis, you decide to get a closer look at the SHAP values for the same customer as in the previous section.

Instructions

Run the cell below to plot the SHAP values for the customer.

In [22]:
shap.plots.waterfall(shap_values[customer_id], max_display=20)
No description has been provided for this image

Instructions

You choose to focus on the feature Capital Loss and try to put 1000 and see what happens.

In [23]:
### PARAMETERS ###
feature_to_change = "Capital Loss" # SOLUTION
new_value = 1000 # SOLUTION
##################
simulation_customer(customer_id, mlp_model, shap_values, explainer, X_test, feature_to_change, new_value, scaler)
Previous data for customer 1113:
Workclass Education-Num Marital Status Occupation Relationship Capital Gain Capital Loss Hours per week
1113 0.0 11.0 2.0 0.0 5.0 0.0 0.0 35.0
Modified data for customer 1113:
Workclass Education-Num Marital Status Occupation Relationship Capital Gain Capital Loss Hours per week
1113 0.0 11.0 2.0 0.0 5.0 0.0 1000 35.0
New SHAP values for customer 1113:
  0%|          | 0/1 [00:00<?, ?it/s]
No description has been provided for this image

Reflection time !

Does it make sense according to you?

Feedback - Click on the "..." below only once you have really tried to answer the question!

No it does not. A capital loss of 1000 should impact negatively the model's prediction as the applicant is more likely to be in a difficult financial situation and less likely to pay back the loan. However, here it increases the probability of the loan to be granted by 9%.

Well this is an interesting finding ! You can try even higher such as 3000 for Capital Loss and it would still be the same issue.

Instructions

Run the folowing cell to display the distribution of the feature Capital Loss in the original dataset. The y-axis represents the number of samples and the x-axis the value of the feature.

In [24]:
plot_capital_loss_distribution(df)
No description has been provided for this image

Reflection time !

How many samples have a Capital Loss greater than 0? What is the ratio of these samples compared to the total number of samples?

Where does the weird behavior of the model come from?

Feedback - Click on the "..." below only once you have really tried to answer the question!

All bars after a Capital Loss of 0 are extremely small compared to the beam at 0 wich is almost 28 000. This means that the model has not seen enough samples with a Capital Loss greater than 0 to learn how to predict them correctly. This is why it behaves weirdly when the Capital Loss is greater than 0.

A possible solution would be to:

  • Resample the dataset to decrease the ratio of samples with a Capital Loss of 0.
  • Use synthetic data generation techniques to generate more samples with a Capital Loss greater than 0.
  • Use adversarial training to make the model more robust to these outliers.

Part 3: Limitations of Explainable AI¶

Let's discuss more in depth the limitations of Explainable AI and its implications when using it in practice.

3.1 Post-Hoc Explainability¶

One of the main limitations when using tools such as SHAP (the library we used in the previous part) or LIME (another popular library for explainability) is that they provide post-hoc explanations.

Post-hoc methods explain the predictions of a trained model without being an inherent part of the model itself. While useful, these explanations:

  • May not perfectly align with the true internal logic of the model, especially for highly complex or non-linear architectures.
  • Are vulnerable to instability, meaning small perturbations in the input data can lead to significant changes in the explanation.
  • Are dependent on the chosen method's assumptions, which can lead to inconsistent or misleading insights across different tools.

Post-hoc explainability is therefore helpful for debugging and understanding models, but it should not be viewed as a definitive explanation of how the model works.

3.2 Towards Self-Explainable Neural Networks¶

Neural networks are increasingly used in practice due to their high accuracy and scalability. They are often chosen over traditional machine learning models that are interpretable by design, such as linear regression or decision trees.

However, as discussed in the first part of this notebook, neural networks are often considered black boxes, meaning their working is difficult to interpret. This opacity creates challenges in high-stakes applications where understanding model predictions is critical for trust, compliance, and debugging.

To address this, researchers are exploring self-explainable neural networks, which are designed to be interpretable by construction. One prominent example is the architecture proposed by David Alvarez-Melis and Tommi Jaakkola in their 2018 paper, "Towards Robust Interpretability with Self-Explaining Neural Networks" (Read it here). Their approach balances the trade-off between accuracy and interpretability by incorporating interpretable components directly into the model's architecture.

Challenges with Self-Explaining Neural Networks:¶

  • The field is still emerging, with fewer practical implementations compared to post-hoc methods.
  • Models are often harder to train due to the added constraints for interpretability.
  • Current self-explaining architectures can achieve similar accuracy to traditional neural networks but require more computational and design effort.

Despite these challenges, self-explainable neural networks represent a promising step towards making complex ML models interpretable.

3.3 What Should We Do in Practice?¶

When deciding how to approach interpretability in practice, you should consider the complexity of the task and the constraints of your project.

For Simpler Models:¶

  • Use interpretable models by design such as:
    • Logistic regression
    • Decision trees
    • Generalized additive models (GAMs)
  • These models offer transparency with little or no need for additional explanation tools.

For More Complex Models:¶

  • If you require high accuracy and scalability and must use complex models (e.g., deep neural networks):
    • Post-hoc explainability is often the most practical choice unless you can train a self-explaining neural network.
    • To make the most of post-hoc methods:
      1. Use multiple explainability tools (e.g., SHAP, LIME) to cross-validate explanations and identify consistent patterns.
      2. Recognize the limitations of these methods and interpret results with caution.
      3. Use post-hoc explanations primarily to:
        • Debug the model.
        • Gain insights into its general behavior.
        • Understand feature importance or sensitivity.
      4. Avoid using these methods as standalone explanations for stakeholders, as their output may oversimplify or misrepresent the model's complexity.

Final Recommendation:¶

In critical applications where interpretability is paramount (e.g., healthcare, finance), prioritize interpretable-by-design models or self-explaining architectures where feasible. Reserve post-hoc methods for exploratory analysis or as supplementary tools.

Congratulations! You have finished this notebook!¶