Introduction to Pytorch
an Introduction of how to use Pytorch
import numpy as np
import torch
from torch import nn
from torch.nn import functional as F
from torchvision import transforms, datasets
NOTE: it is recommended to watch this link about "Intoduction of how to code in Pytorch" instructed by Rassa Ghavami beforehand.
tensor is mostly same as numpy array (even its applications like broadcasting operation, indexing, slicing and etc), except for it brings us the opportunity to run operations on faster hardwares like GPU. let's see some tensor defintion
arr = torch.zeros((256, 256), dtype=torch.int32)
# tensors are defined by default at CPU
print(arr.device)
# keep 'size', 'dtype' and 'device' same as arr, but fill with 1
arr2 = torch.ones_like(arr)
# keep 'dtype' and 'device' same as arr, but fill data arbitrarily
arr3 = arr.new_tensor([[1, 2], [3, 4]])
in order to feed tensors to deep-learning models, they should follow a customary shape form; B C H W
for 4D tensors where B
is batch size, C
is channel dimension and H W
are spatial dimensions.
first we need to determine which device all torch tensors (including the input, learning weights and etc) are going to be allocated. basically, GPU is the first priority.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
it is often recommended to generate pseudo random numbers as it provides fair comparison between different configs of deep learning model(s). torch provides this by torch.manual_seed
.
np.random.seed(12345)
# same seed on all devices; both CPU and CUDA
torch.manual_seed(12345)
In this section, you will learn how to build and train a CNN model.
pytorch models are defined as python classes inherited from torch.nn.Module
. two functions are essential for model creation:
__init__()
.forward()
.so let's create a multi-classification CNN model (with ten ground-truth labels) containing the following layers: Conv
-> ReLU
-> Batchnorm
-> Conv
-> ReLU
-> Batchnorm
-> Adaptive average pooling
-> dropout
-> fully connected
. suppose the input has only one channel and forward()
will only return output of the model.
class Model(nn.Module):
def __init__(self):
super().__init__()
# your code here
def forward(self, x: torch.Tensor) -> torch.Tensor:
# your code here
return x
Previously, we have determined which device (GPU or CPU) is going to be used, although it has not been allocated yet to parameters of the model. Pytorch .to(device)
Api provides this for us.
model = Model()
model.to(device)
there are two phases for a Pytorch model: .train()
and .eval()
. models are by default at .train()
phase, however the difference between these two is that in eval()
phase, some layers change their behavior during inference; for instance dropout will be deactivated and batch normalization will not update estimated mean and variance and they will be used only for normalization, hence please note .eval()
will not block parameters to be updated. therefore during evaluation, besides model.eval()
we should assure that back propagation is temporarily deactivated and this is possible by torch.no_grad()
. indeed disabling the gradient calculation enables us to use bigger batch sizes as it speeds up the computation and reduces memory usage.
Before training, we need to prepare and process our dataset which is MNIST here.
PIL images should first be transformed to torch tensors. torchvision.transforms.Compose
provides a pipeline of transforms. in the following 'converting to tensors' is only applied.
transform = transforms.Compose([
transforms.ToTensor()
])
as evaluation is not purpose of this notebook, you only need to load train set of MNIST dataset using torchvision.datasets.MNIST
.
# your code here
train = None
define train loader using torch.utils.data.DataLoader
.
batch_size = 32
# your code here
train_loader = None
here we are going to develop training process of MNIST classification.
define your optimizer, use torch.optim
.
# your code here
optimizer = None
implement the procedure of training in the following cell. please note evaluation is not purpose of this notebook, therefore only report the training loss changes which ought to be descending in general. consider cross entropy as loss function and compute it without using pre-defined APIs. the backpropagation consists of three sub-parts:
fortunately we don't need to implement them from sctrach as pytorch provides APIs for them.
num_epochs = 3
num_iters = len(train_loader)
train_losses = np.zeros((num_epochs, num_iters), dtype=np.float32)
for epoch in range(num_epochs):
for it, (X, y) in enumerate(train_loader):
## forward model
## compute loss
## backpropagation
Hooks are magic and in Pytorch (which is same as everywhere else), they are functions triggered at some certain points of the process. Currently, Pytorch supports three global hooks: register_forward_pre_hook
, register_forward_hook
and register_backward_hook
.
triggers before starting the forward passing of a module and so let's see how it works :). write a hook to check the input x
of your forward
function has exactly 4 number of dimensions, otherwise throw an exception.
### your code here
incorrect_x = torch.zeros(1, 512, 512)
correct_x = torch.zeros(1, 1, 512, 512)
incorrect_x2 = torch.zeros(1, 1, 1, 512, 512)
try:
model(incorrect_x)
print("FAILED case1")
except:
print("PASSED case1")
try:
model(correct_x)
print("PASSED case2")
except:
print("FAILED case2")
try:
model(incorrect_x2)
print("FAIILED case3")
except:
print("PASSED case3")
triggers at the end of forward passing of the module. so now write a code to visualize the importance score of each layer given to the input image. the purpose is to recognize which layer pays attention to which part of the image. also choose a sample input by your choice.
### your code here
### your code here
Layers and parameters recorded in either of python's list or dictionary will not be registered in graph computation and therefore will not be trained. Instead, it is recommended to use ModuleList
and ModuleDict
. If you have use it now then please change it :).
Using loops extremely slows down the process and it should be avoided by parallel computation as much as possible.
parameters can be saved using torch.save
and passing the state_dict
object of the model. In addition, it could be loaded by passing the returned object of torch.load
to the model's load_state_dict
API.