Code
from fastai.vision.all import *
Satyabrata pal
February 23, 2022
get_image_files
is a convenience function that goes through different folders and subfolders and gathers all the image file path.
(#10) [Path('../input/jpeg-happywhale-128x128/train_images-128-128/train_images-128-128/80b5373b87942b.jpg'),Path('../input/jpeg-happywhale-128x128/train_images-128-128/train_images-128-128/e113b51585c677.jpg'),Path('../input/jpeg-happywhale-128x128/train_images-128-128/train_images-128-128/94eb976e25416c.jpg'),Path('../input/jpeg-happywhale-128x128/train_images-128-128/train_images-128-128/19a45862ab99cd.jpg'),Path('../input/jpeg-happywhale-128x128/train_images-128-128/train_images-128-128/be9645065510e9.jpg'),Path('../input/jpeg-happywhale-128x128/train_images-128-128/train_images-128-128/76d25044120d3c.jpg'),Path('../input/jpeg-happywhale-128x128/train_images-128-128/train_images-128-128/0c64a705ba5d31.jpg'),Path('../input/jpeg-happywhale-128x128/train_images-128-128/train_images-128-128/c1fe278bdbd837.jpg'),Path('../input/jpeg-happywhale-128x128/train_images-128-128/train_images-128-128/6f94f30ac500a0.jpg'),Path('../input/jpeg-happywhale-128x128/train_images-128-128/train_images-128-128/15f295322e5b54.jpg')]
let’s see what is there in the csv file.
image | species | individual_id | |
---|---|---|---|
0 | 00021adfb725ed.jpg | melon_headed_whale | cadddb1636b9 |
1 | 000562241d384d.jpg | humpback_whale | 1a71fbb72250 |
2 | 0007c33415ce37.jpg | false_killer_whale | 60008f293a2b |
3 | 0007d9bca26a99.jpg | bottlenose_dolphin | 4b00fe572063 |
4 | 00087baf5cef7a.jpg | humpback_whale | 8e5253662392 |
If I can catch hold of the image filename from the dataframe and attach this to the train_img_path
then I can get the full image path.
Replacing filnames with the filepath
image | species | individual_id | |
---|---|---|---|
0 | ../input/jpeg-happywhale-128x128/train_images-128-128/train_images-128-128/00021adfb725ed.jpg | melon_headed_whale | cadddb1636b9 |
1 | ../input/jpeg-happywhale-128x128/train_images-128-128/train_images-128-128/000562241d384d.jpg | humpback_whale | 1a71fbb72250 |
A DataBlock
is like a blueprint that tells fastai–> * What kind of data are we dealing with. Is it image, text etc. * What kind of labels we have. For example categorical labels, continuous labels etc. * From where to get the inputs. * From where to get the targets. * How to split the data into train and test. * The transforms that you want to do.
In the above code–> * The ImageBlock
tells fastai that we are dealing with image data. * CategoryBlock
means our targets are categorical in nature. * The get_x
and get_y
tells fastai to get the x i.e. the image path from column 0 of dataframe and the target from column 1 of the dataframe. * splitter
is the way in which train and test data are split. check here for more details. * The next two lines are not that important to understand now but they are needed when you are working on image augmentation. Refer to the fastbook to understand more about this.
Think of a dataloader as a mechanism that picks up your images, does all the things that you had described in the datablock and then loads your data to the device(cpu or gpu).
At the minimum, you need to provide the source of data, the train_path_df
in our case. Here we give the bs
i.e. the batch size. For example if we say that the batch size is 8 then it means that the dataloader will load 8 images at a time onto the device.
/root/.local/lib/python3.7/site-packages/torch/_tensor.py:1023: UserWarning: torch.solve is deprecated in favor of torch.linalg.solveand will be removed in a future PyTorch release.
torch.linalg.solve has its arguments reversed and does not return the LU factorization.
To get the LU factorization see torch.lu, which can be used with torch.lu_solve or torch.lu_unpack.
X = torch.solve(B, A).solution
should be replaced with
X = torch.linalg.solve(A, B) (Triggered internally at /pytorch/aten/src/ATen/native/BatchLinearAlgebra.cpp:760.)
ret = func(*args, **kwargs)
Simple CNN to classify whale/dolphin species. The intuition is the following–> * a model which knows to recognize whale/dolphin species should also be able to learn the features to distinguish the different species. * such a model can be queried to learn what all portions of an image is taken into account. For example, if somehow the background is tricking the model such that the model considers the background as an area of interest then such data needs to be further engineered.
cnn_learner
creates a convolutional neural network for you. At the least you provide it the dataloader, a model architecture and a loss function.
models.resnet18
is a “pre-trained” neural network. It’s a network that was already trained by some good folks (folks with huge computational hardware) on a very large number of images. This particular network knows how to recognize different images. So, we take this network and then ask fastai to train this network on our image data (whale, dolphins).
The intuition is that since the pre-trained network already knows how to distinguish between different images, we have to spend less time and computation to make it understand how to recognize different whales and dolphins. This is Transfer learning
Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
/root/.local/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
For now understand that fine_tune
means “take my network and make it look at each image a specified number of times. This”number of times” is the number that is provided as the first argument.
Let’s not dive into the second argument at this moment in time.
For the curious mind, the second argument is known as the learning rate. It’s a hyperparameter of a neural network. More information can be found here
epoch | train_loss | valid_loss | time |
---|---|---|---|
0 | 0.905516 | 0.673760 | 03:58 |
epoch | train_loss | valid_loss | time |
---|---|---|---|
0 | 0.422199 | 0.337718 | 02:24 |
1 | 0.231187 | 0.232147 | 02:24 |
We can see what the model predicted vs the actual target.
We can use class activation maps (CAM) to see which regions of images are of interest to the neural network. Class activation map (CAM) was first introduced by Bolei Zhou et al. in “Learning Deep Features for Discriminative Localization”. Cam uses the output of the last convolutional layer with the predictions to give a heatmap of the regions of interest of the network in an image.
The intuition behind CAM is that if we do the dot product of the activations of the final layer with the final weights, for each location on our feature map then we can get the score of the feature that was used to make a decision.
Pytorch provides hooks to hook onto the any layer of a network. We can attach a hook to any layer of a neural network and it will be executed during the forward pass i.e. the moment when the output is computed or during the backward pass i.e. the moment when the weights are being re-adjusted.
I would highly recommend to read more about CAM in this chapter of fastbook before proceeding on with the code.
Let’s grab a batch.
We hook into the network . A forward hook takes into three things –> the model , its input and it’s output. The fourth line in the below code is the hook function.
Now we do the dot product of our weight matrix with the activations.
The code below takes the model, hooks into the last layer of our model using our hook function and stores the activations in the act
variable. Then we do the dot product of the stored activations with the weights using torch.einsum
.
For each image in our batch, and for each class, we get a 7×7 feature map that tells us where the activations were higher and where they were lower. This will let us see which areas of the pictures influenced the model’s decision.
The bright areas in the activations are somehow spilling over to the ocean. In my opinion this should not be the case. The activations should be more around the of the dorsal fins and the surface of the fin. In layman terms the network should focus more on the object i.e. the fin features and not on the surrounding water.
An important takeaway from my perspective is that if we can restrict the region of interest to the whale fin and not the surrounding water then the network would be able to learn better.
Besides tasks like classifying images or detecting objects, a neural networks can be used for analyzing the data to device strategies for pre-processing, data engineering and data cleaning.
Here I have bounced off one idea about how a simple CNN can be used to understand the regions in an image that the neural network focuses to make it’s decisions and then analyzing these things we can observer if the network is focusing on areas of images that it needs to focus and if not, then we can think of the steps that we can take to help the network to focus to generalize better.
I have just scratched the surface of what’s possible here and there can be many more ways in which deep learning can be used to take better decisions on data. So, I would emphasize you to expand on this idea and think of other ways in which you can use simple networks to analyze the data before building an architecture for the actual task at hand.