Introduction to Pytorch

an Introduction of how to use Pytorch

In [ ]:
import numpy as np
import torch
from torch import nn
from torch.nn import functional as F
from torchvision import transforms, datasets

NOTE: it is recommended to watch this link about "Intoduction of how to code in Pytorch" instructed by Rassa Ghavami beforehand.

What is Tensor?

tensor is mostly same as numpy array (even its applications like broadcasting operation, indexing, slicing and etc), except for it brings us the opportunity to run operations on faster hardwares like GPU. let's see some tensor defintion

In [ ]:
arr = torch.zeros((256, 256), dtype=torch.int32)

# tensors are defined by default at CPU
print(arr.device)

# keep 'size', 'dtype' and 'device' same as arr, but fill with 1
arr2 = torch.ones_like(arr)

# keep 'dtype' and 'device' same as arr, but fill data arbitrarily
arr3 = arr.new_tensor([[1, 2], [3, 4]])

in order to feed tensors to deep-learning models, they should follow a customary shape form; B C H W for 4D tensors where B is batch size, C is channel dimension and H W are spatial dimensions.

Device determination

first we need to determine which device all torch tensors (including the input, learning weights and etc) are going to be allocated. basically, GPU is the first priority.

In [ ]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Pseudo random generation

it is often recommended to generate pseudo random numbers as it provides fair comparison between different configs of deep learning model(s). torch provides this by torch.manual_seed.

In [ ]:
np.random.seed(12345)

# same seed on all devices; both CPU and CUDA
torch.manual_seed(12345)

Build a CNN model

In this section, you will learn how to build and train a CNN model.

pytorch models are defined as python classes inherited from torch.nn.Module. two functions are essential for model creation:

  1. learning weights (parameters) and network layers are defined within __init__().
  2. forwarding procedure of the model is developed within forward().

so let's create a multi-classification CNN model (with ten ground-truth labels) containing the following layers: Conv -> ReLU -> Batchnorm -> Conv -> ReLU -> Batchnorm -> Adaptive average pooling -> dropout -> fully connected. suppose the input has only one channel and forward() will only return output of the model.

In [ ]:
class Model(nn.Module):
    
    def __init__(self):
        super().__init__()
        # your code here

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # your code here    
        
        return x

set model device

Previously, we have determined which device (GPU or CPU) is going to be used, although it has not been allocated yet to parameters of the model. Pytorch .to(device) Api provides this for us.

In [ ]:
model = Model()

model.to(device)

Model phases

there are two phases for a Pytorch model: .train() and .eval(). models are by default at .train() phase, however the difference between these two is that in eval() phase, some layers change their behavior during inference; for instance dropout will be deactivated and batch normalization will not update estimated mean and variance and they will be used only for normalization, hence please note .eval() will not block parameters to be updated. therefore during evaluation, besides model.eval() we should assure that back propagation is temporarily deactivated and this is possible by torch.no_grad(). indeed disabling the gradient calculation enables us to use bigger batch sizes as it speeds up the computation and reduces memory usage.

Data processing

Before training, we need to prepare and process our dataset which is MNIST here.

Data transformation

PIL images should first be transformed to torch tensors. torchvision.transforms.Compose provides a pipeline of transforms. in the following 'converting to tensors' is only applied.

In [ ]:
transform = transforms.Compose([
    transforms.ToTensor()
])

Download data

as evaluation is not purpose of this notebook, you only need to load train set of MNIST dataset using torchvision.datasets.MNIST.

In [ ]:
# your code here
train = None

Data loader

define train loader using torch.utils.data.DataLoader.

In [ ]:
batch_size = 32

# your code here
train_loader = None

Training

here we are going to develop training process of MNIST classification.

Optimizer

define your optimizer, use torch.optim.

In [ ]:
# your code here
optimizer = None

Procedure

implement the procedure of training in the following cell. please note evaluation is not purpose of this notebook, therefore only report the training loss changes which ought to be descending in general. consider cross entropy as loss function and compute it without using pre-defined APIs. the backpropagation consists of three sub-parts:

  1. gradient computation
  2. updating learning parameters
  3. removing current computed gradients for next iteration

fortunately we don't need to implement them from sctrach as pytorch provides APIs for them.

In [ ]:
num_epochs = 3
num_iters = len(train_loader)
train_losses = np.zeros((num_epochs, num_iters), dtype=np.float32) 

for epoch in range(num_epochs):
    for it, (X, y) in enumerate(train_loader):
        ## forward model
        
        ## compute loss
        
        ## backpropagation

More notes

Hooks

Hooks are magic and in Pytorch (which is same as everywhere else), they are functions triggered at some certain points of the process. Currently, Pytorch supports three global hooks: register_forward_pre_hook, register_forward_hook and register_backward_hook.

register_forward_pre_hook

triggers before starting the forward passing of a module and so let's see how it works :). write a hook to check the input x of your forward function has exactly 4 number of dimensions, otherwise throw an exception.

In [ ]:
### your code here

incorrect_x = torch.zeros(1, 512, 512)
correct_x = torch.zeros(1, 1, 512, 512)
incorrect_x2 = torch.zeros(1, 1, 1, 512, 512)

try:
    model(incorrect_x)
    print("FAILED case1")
except:
    print("PASSED case1")

try:
    model(correct_x)
    print("PASSED case2")
except:
    print("FAILED case2")

try:
    model(incorrect_x2)
    print("FAIILED case3")
except:
    print("PASSED case3")

register_forward_hook

triggers at the end of forward passing of the module. so now write a code to visualize the importance score of each layer given to the input image. the purpose is to recognize which layer pays attention to which part of the image. also choose a sample input by your choice.

In [ ]:
### your code here

register_backward_hook


triggers at the end of backward passing of the module. clipping gradient is a way to control the movement in optimization space and to avoid fast parameter updating in order to better explore the optimization space. write a hook to ensure the gradient calculated for each layer is limited to "two" **absolute value**. use it and train again your model from scratch and check how it has varied. Also **unregister the last hook** to avoid repetitive visualizations.
In [ ]:
### your code here

Python built-in list and dictionary effect

Layers and parameters recorded in either of python's list or dictionary will not be registered in graph computation and therefore will not be trained. Instead, it is recommended to use ModuleList and ModuleDict. If you have use it now then please change it :).

Avoid looping

Using loops extremely slows down the process and it should be avoided by parallel computation as much as possible.

Save and load

parameters can be saved using torch.save and passing the state_dict object of the model. In addition, it could be loaded by passing the returned object of torch.load to the model's load_state_dict API.

gradient clipping

Here you have learned how to clip gradients using backward hooks. Also there is an alternative for this which is torch.nn.utils.clip_grad_value_ API.