Introduction: Advanced Explainable AI for computer vision#

pip install grad-cam

This is a package with state of the art methods for Explainable AI for computer vision. This can be used for diagnosing model predictions, either in production or while developing models. The aim is also to serve as a benchmark of algorithms and metrics for research of new explainability methods.

⭐ Comprehensive collection of Pixel Attribution methods for Computer Vision.

⭐ Tested on many Common CNN Networks and Vision Transformers.

⭐ Advanced use cases: Works with Classification, Object Detection, Semantic Segmentation, Embedding-similarity and more.

⭐ Includes smoothing methods to make the CAMs look nice.

⭐ High performance: full support for batches of images in all methods.

⭐ Includes metrics for checking if you can trust the explanations, and tuning them for best performance.



What it does


Weight the 2D activations by the average gradient


Like GradCAM but element-wise multiply the activations with the gradients; provably guaranteed faithfulness for certain models


Like GradCAM but element-wise multiply the activations with the gradients then apply a ReLU operation before summing


Like GradCAM but uses second order gradients


Like GradCAM but scale the gradients by the normalized activations


Zero out activations and measure how the output drops (this repository includes a fast batched implementation)


Perbutate the image by the scaled activations and measure how the output drops


Takes the first principle component of the 2D Activations (no class discrimination, but seems to give great results)


Like EigenCAM but with class discrimination: First principle component of Activations*Grad. Looks like GradCAM, but cleaner


Spatially weight the activations by positive gradients. Works better especially in lower layers


Computes the gradients of the biases from all over the network, and then sums them

Deep Feature Factorizations

Non Negative Matrix Factorization on the 2D activations

Visual Examples#

What makes the network think the image label is ‘pug, pug-dog’

What makes the network think the image label is ‘tabby, tabby cat’

Combining Grad-CAM with Guided Backpropagation for the ‘pug, pug-dog’ class

Object Detection and Semantic Segmentation#

Object Detection

Semantic Segmentation

Explaining similarity to other images / embeddings#

Deep Feature Factorization#










Vision Transfomer (Deit Tiny):#








Swin Transfomer (Tiny window:7 patch:4 input-size:224):#








Metrics and Evaluation for XAI#

Chosing the Target Layer#

You need to choose the target layer to compute CAM for. Some common choices are:

  • FasterRCNN: model.backbone

  • Resnet18 and 50: model.layer4[-1]

  • VGG and densenet161: model.features[-1]

  • mnasnet1_0: model.layers[-1]

  • ViT: model.blocks[-1].norm1

  • SwinT: model.layers[-1].blocks[-1].norm1

If you pass a list with several layers, the CAM will be averaged accross them. This can be useful if you’re not sure what layer will perform best.

Using from code as a library#

from pytorch_grad_cam import GradCAM, HiResCAM, ScoreCAM, GradCAMPlusPlus, AblationCAM, XGradCAM, EigenCAM, FullGrad
from pytorch_grad_cam.utils.model_targets import ClassifierOutputTarget
from pytorch_grad_cam.utils.image import show_cam_on_image
from torchvision.models import resnet50

model = resnet50(pretrained=True)
target_layers = [model.layer4[-1]]
input_tensor = # Create an input tensor image for your model..
# Note: input_tensor can be a batch tensor with several images!

# Construct the CAM object once, and then re-use it on many images:
cam = GradCAM(model=model, target_layers=target_layers, use_cuda=args.use_cuda)

# You can also use it within a with statement, to make sure it is freed,
# In case you need to re-create it inside an outer loop:
# with GradCAM(model=model, target_layers=target_layers, use_cuda=args.use_cuda) as cam:
#   ...

# We have to specify the target we want to generate
# the Class Activation Maps for.
# If targets is None, the highest scoring category
# will be used for every image in the batch.
# Here we use ClassifierOutputTarget, but you can define your own custom targets
# That are, for example, combinations of categories, or specific outputs in a non standard model.

targets = [ClassifierOutputTarget(281)]

# You can also pass aug_smooth=True and eigen_smooth=True, to apply smoothing.
grayscale_cam = cam(input_tensor=input_tensor, targets=targets)

# In this example grayscale_cam has only one image in the batch:
grayscale_cam = grayscale_cam[0, :]
visualization = show_cam_on_image(rgb_img, grayscale_cam, use_rgb=True)

Metrics and evaluating the explanations#

from pytorch_grad_cam.utils.model_targets import ClassifierOutputSoftmaxTarget
from pytorch_grad_cam.metrics.cam_mult_image import CamMultImageConfidenceChange
# Create the metric target, often the confidence drop in a score of some category
metric_target = ClassifierOutputSoftmaxTarget(281)
scores, batch_visualizations = CamMultImageConfidenceChange()(input_tensor, 
  inverse_cams, targets, model, return_visualization=True)
visualization = deprocess_image(batch_visualizations[0, :])

# State of the art metric: Remove and Debias
from pytorch_grad_cam.metrics.road import ROADMostRelevantFirst, ROADLeastRelevantFirst
cam_metric = ROADMostRelevantFirst(percentile=75)
scores, perturbation_visualizations = cam_metric(input_tensor, 
  grayscale_cams, targets, model, return_visualization=True)

# You can also average accross different percentiles, and combine
# (LeastRelevantFirst - MostRelevantFirst) / 2
from pytorch_grad_cam.metrics.road import ROADMostRelevantFirstAverage,
cam_metric = ROADCombined(percentiles=[20, 40, 60, 80])
scores = cam_metric(input_tensor, grayscale_cams, targets, model)

Smoothing to get nice looking CAMs#

To reduce noise in the CAMs, and make it fit better on the objects, two smoothing methods are supported:

  • aug_smooth=True

    Test time augmentation: increases the run time by x6.

    Applies a combination of horizontal flips, and mutiplying the image by [1.0, 1.1, 0.9].

    This has the effect of better centering the CAM around the objects.

  • eigen_smooth=True

    First principle component of activations*weights

    This has the effect of removing a lot of noise.


aug smooth

eigen smooth

aug+eigen smooth

Running the example script:#

Usage: python --image-path <path_to_image> --method <method>

To use with CUDA: python --image-path <path_to_image> --use-cuda

You can choose between:

GradCAM , HiResCAM, ScoreCAM, GradCAMPlusPlus, AblationCAM, XGradCAM , LayerCAM, FullGrad and EigenCAM.

Some methods like ScoreCAM and AblationCAM require a large number of forward passes, and have a batched implementation.

You can control the batch size with cam.batch_size =


If you use this for research, please cite. Here is an example BibTeX entry:

  title={PyTorch library for CAM methods},
  author={Jacob Gildenblat and contributors},

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra
Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks Rachel L. Draelos, Lawrence Carin
Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks Aditya Chattopadhyay, Anirban Sarkar, Prantik Howlader, Vineeth N Balasubramanian
Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks Haofan Wang, Zifan Wang, Mengnan Du, Fan Yang, Zijian Zhang, Sirui Ding, Piotr Mardziel, Xia Hu
Ablation-cam: Visual explanations for deep convolutional network via gradient-free localization. Saurabh Desai and Harish G Ramaswamy. In WACV, pages 972–980, 2020
Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs Ruigang Fu, Qingyong Hu, Xiaohu Dong, Yulan Guo, Yinghui Gao, Biao Li
Eigen-CAM: Class Activation Map using Principal Components Mohammed Bany Muhammad, Mohammed Yeasin
LayerCAM: Exploring Hierarchical Class Activation Maps for Localization Peng-Tao Jiang; Chang-Bin Zhang; Qibin Hou; Ming-Ming Cheng; Yunchao Wei
Full-Gradient Representation for Neural Network Visualization Suraj Srinivas, Francois Fleuret
Deep Feature Factorization For Concept Discovery Edo Collins, Radhakrishna Achanta, Sabine Süsstrunk