This notebook uses the approach of generating bounding boxes from Class Activation Maps (CAM). CAM was introduced by Bolei Zhou et al. in the paper Learning Deep Features for Discriminative Localization and is a great tool for model interpretation.

Grad CAM is a variation of CAM that was introduced in Grad-CAM: Why Did You Say That? Visual Explanations from Deep Networks via Gradient-based Localization

Grad CAM is similar to CAM in that it is a weighted combination of feature maps but followed by a ReLU. The output of Grad CAM is a “class-discriminative localization map” where the hot part of the heatmap corresponds to a particular class.

You can view this notebook on github

CAM uses the output of the last convolutional layer to provide a heatmap visualization. A way of acccessing the activations during the training phase in Pytorch is done by using a hook. A hook is basically a function that is executed when either the forward or backward pass is called.

Fastai conveniently has a Hook class

class Hook():
    def __init__(self, m):
        self.hook = m.register_forward_hook(self.hook_func)   
    def hook_func(self, m, i, o): self.stored = o.detach().clone()
    def __enter__(self, *args): return self
    def __exit__(self, *args): self.hook.remove()

The gradients of every layer are calculated by PyTorch during the backward pass. To access the gradients you can register a hook on the backward pass and store these gradients.

Again conveniently fastai has the HookBwd class.

class HookBwd():
    def __init__(self, m):
        self.hook = m.register_backward_hook(self.hook_func)   
    def hook_func(self, m, gi, go): self.stored = go[0].detach().clone()
    def __enter__(self, *args): return self
    def __exit__(self, *args): self.hook.remove()

Load the Data and Create DataBlock

The data and DataBlock creation is can be found in this notebook

pneu = untar_data(URLs.SIIM_SMALL)
df = pd.read_csv(f'{pneu}/labels.csv')
p_items = get_dicom_files(f'{pneu}/train')

item_tfms = Resize(266)
batch_tfms = [RandomResizedCropGPU(226), *aug_transforms(do_flip=False, flip_vert=False, max_rotate=10.,
                                                         min_zoom=1., max_zoom=1.1, max_lighting=0.2, max_warp=0.1),

splitter = dicom_splitter(p_items, valid_pct=0.2, seed=7)

pneumothorax = DataBlock(blocks=(ImageBlock(cls=PILDicom), CategoryBlock),
                   get_x=lambda x:pneu/f"{x[0]}",
                   get_y=lambda x:x[1],
                   item_tfms = item_tfms,
                   batch_tfms = batch_tfms)

dls = pneumothorax.dataloaders(df.values, bs=32, num_workers=0)

learn = vision_learner(dls, 
                    'resnetblur50' ,
                    cbs=[ShowGraphCallback(), ReduceLROnPlateau(monitor='valid_loss', min_delta=0.1, patience=1),\
                         GradientClip, \
                         SaveModelCallback(monitor='accuracy',fname='siim_small_best',comp=np.greater, with_opt=True)])
200 50, 1e-3)
epoch train_loss valid_loss accuracy time
0 1.252483 0.934061 0.620000 00:05
1 1.278853 1.002197 0.460000 00:05
2 1.280225 0.949744 0.540000 00:05
Better model found at epoch 0 with accuracy value: 0.6200000047683716.
Epoch 1: reducing lr to 0.0001
Epoch 2: reducing lr to 1e-05

Generating the heatmaps

This is the walk through of how the heatmaps are generated. First we grab an image.

test_path = 'C:/Users/avird/.fastai/data/siim_small/test'
test_files = get_dicom_files(test_path)
test_one = test_files[0]

We can use learn.predict to predict the class of the test image. This returns the class, prediction and probabilities

cl, pred, probs = learn.predict(test_one)
print(cl, pred, probs)
No Pneumothorax TensorBase(0) TensorBase([0.9962, 0.0038])

To be able to generate the correct heatmap for the predicted class Pneumothorax we need to pass the predicted tensor value which is 1 when generating the heatmap. Recall that the number of classes in a dataset can be found by doing:

['No Pneumothorax', 'Pneumothorax']

Where No Pneumothorax is class 0 and Pneumothorax is class 1

You can define the predicted tensor like so:

cls = pred.item()

first is handy fastcore function that takes the first item in a list. In this case we can create a list comprehension of files in the test folder and x will be the first item in that folder

x, = first(dls.test_dl([file for file in test_files]))

Generate the image

x_dec = TensorImage(dls.train.decode((x,))[0][0]);

Grab the gradients using Hook and HookBwd and here we are grabbing them from the last layer learn.model[0][-1]

with HookBwd(learn.model[0].model.layer4[-1]) as hookg:
    with Hook(learn.model[0].model.layer4[-1]) as hook:
        output = learn.model.eval()(x.cuda())
        act = hook.stored
    output[0, cls].backward()
    grad = hookg.stored

Generate the heatmap

w = grad[0].mean(dim=[1,2], keepdim=True)
cam_map = (w * act[0]).sum(0)

Check the shape of cam_map

torch.Size([8, 8])

In this case the heatmap is a 8 by 8 square. Note that the shape of cam_map is determined by the item_tfms shape set in the DataBlock. The larger the image used during training the larger the heat map.

We can view the heatmap using show_image

show_image(cam_map, figsize=(4,4));

The location on the heatmap that has the highest activations is the bright yellow square at position (4, 7).

We can view the heatmap on the image like so:

_,ax = plt.subplots()
ax.imshow(cam_map.detach().cpu(), alpha=0.6, extent=(0, x_dec.shape[1], x_dec.shape[2],0),
              interpolation='bilinear', cmap='magma');

How do you generate the bounding boxes?

Now we need to be able to generate a bounding box around the area with the highest activation.

To do this we need to iterate over the cam_map values and get the indexes of the x and and y locations where the activations are the highest.

cam_map is a 8 by 8 TensorDicom

cam_map, cam_map.shape
(TensorDicom([[-0.0323, -0.0516, -0.0334, -0.0313, -0.0308, -0.0185, -0.0270, -0.0374],
         [-0.0312,  0.0720,  0.1065,  0.0011,  0.0067, -0.0115,  0.0088, -0.0220],
         [ 0.0602,  0.2496,  0.3080,  0.2261,  0.0381,  0.0546,  0.0313, -0.0036],
         [-0.0693,  0.0633,  0.1196,  0.1218,  0.0321,  0.0277,  0.0128,  0.0244],
         [-0.0234,  0.0895,  0.1049,  0.0646,  0.0707,  0.0333,  0.0303,  0.0101],
         [ 0.0038,  0.0531,  0.0158,  0.0226,  0.0879,  0.0443,  0.0624,  0.0166],
         [-0.0166, -0.0376, -0.0312,  0.0028,  0.0026,  0.0564, -0.0378, -0.0465],
         [ 0.0076, -0.0125, -0.0063, -0.0015, -0.0026,  0.0116, -0.0725, -0.0699]],
 torch.Size([8, 8]))

Convert the cam_map into a tensor but first we need to copy the tensor to host memory first by using .cpu()

arr = np.array(cam_map.cpu())
ten_map = tensor(arr)

Find the value of the tensor with the maximum value


Now to get the indexes of the maximum tensor we start by first finding the maximum value in each of the rows in ten_map

val = []
for i in range(0, ten_map.shape[0]):
    index, value = max(enumerate(ten_map[i]), key=operator.itemgetter(1))

We now have the maximum value of each row. Now to get the y_index and confirm that it matches the maximum value

y_index, y_value = max(enumerate(val), key=operator.itemgetter(1))
print(y_index, y_value)
2 tensor(0.3080)

The highest value of 0.1026 is found at index 4(remember that indexes start from 0 so index 4 is the 5th row)

Get the x_index

x_index, x_value = max(enumerate(ten_map[y_index]), key=operator.itemgetter(1))
x_index, x_value
(2, tensor(0.3080))

We can confirm that the highest value (0.1026) can be found at index [4,7]

Now we need to match that index to the image

To do this we need to know the shape of the image

imz = pydicom.dcmread(test_files[0]).pixel_array 
(1024, 1024)

And since the heatmap is broken down into 8 by 8 squares we need to divide the image into 8 by 8 sections so that they correspond to each section of the heatmap. In this case the image is already a square but in reality it is not usually the case.

cms = cam_map.shape[0]

Along the x-axis

x_ = imz.shape[1] // cms

Along the y-axis

y_ = imz.shape[0] // cms

So the image now has 8 by 8 squares each of size 128 by 128.

To match the square with highest activations we multiply the x and y indexes we calculated earlier to the size of each square

x = x_index * x_
y = y_index * y_

Using TensorBBox we can now plot a bounding box around the area of the image which has the highest activations

box = TensorBBox([x,y, (x + (imz.shape[0]//cms)),(y + (imz.shape[1]//cms))])
_,ax = plt.subplots()
ax.imshow(cam_map.detach().cpu(), alpha=0.6, extent=(0, x_dec.shape[1], x_dec.shape[2],0),
              interpolation='bilinear', cmap='magma');
ctx = show_image(imz);


get_cmaps is a handy fmi function from where you can easily view activation maps

We first have to define which layer in the model you want to view. For this case I want to view the last layer

layer = learn.model[0].model.layer4[-1]

get_cmaps allows for some flexibility:

  • sanity = if you want to check actual values
  • show_maps = display the class activation map
  • show_cmap = display the test image with the class activation map super-imposed
get_cmaps(test_one, dls, learn, layer=layer, sanity=True, show_maps=True, show_cmap=True)
No Pneumothorax
TensorBase([0, 1])
TensorBase([0.9962, 0.0038])

The predicted class for this image is Pneumothorax which is tensor 1 with a probability of 0.58.

get_cmaps displays the activation maps of each class (predicted first) and also displays the image with super imposed activation maps


get_boxes creates bounding boxes at locations with the highest activations in each class activation map.

It also includes code updates to TensorBBox and LabelledBBox classes so that you can change the color of the bounding boxes

get_boxes(test_one, dls, learn, layer=layer, sanity=False , show_maps=False, show_img=True, color='red')