Introduction to Pytorch: A solution

an Introduction of how to use Pytorch

In [1]:
import numpy as np
import torch
from torch import nn
from torch.nn import functional as F
from torchvision import transforms, datasets

NOTE: it is recommended to watch this link about "Intoduction of how to code in Pytorch" instructed by Rassa Ghavami beforehand.

What is Tensor?

tensor is mostly same as numpy array (even its applications like broadcasting operation, indexing, slicing and etc), except for it brings us the opportunity to run operations on faster hardwares like GPU. let's see some tensor defintion

In [2]:
arr = torch.zeros((256, 256), dtype=torch.int32)

# tensors are defined by default at CPU
print(arr.device)

# keep 'size', 'dtype' and 'device' same as arr, but fill with 1
arr2 = torch.ones_like(arr)

# keep 'dtype' and 'device' same as arr, but fill data arbitrarily
arr3 = arr.new_tensor([[1, 2], [3, 4]])
cpu

in order to feed tensors to deep-learning models, they should follow a customary shape form; B C H W for 4D tensors where B is batch size, C is channel dimension and H W are spatial dimensions.

Device determination

first we need to determine which device all torch tensors (including the input, learning weights and etc) are going to be allocated. basically, GPU is the first priority.

In [3]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Pseudo random generation

it is often recommended to generate pseudo random numbers as it provides fair comparison between different configs of deep learning model(s). torch provides this by torch.manual_seed.

In [4]:
np.random.seed(12345)

# same seed on all devices; both CPU and CUDA
torch.manual_seed(12345)
Out[4]:
<torch._C.Generator at 0x7fa014197470>

Build a CNN model

from now on, you will learn how to build and train a CNN model.

pytorch models are defined as python classes inherited from torch.nn.Module. two functions are essential for model creation:

  1. learning weights (parameters) and network layers are defined within __init__().
  2. forwarding procedure of the model is developed within forward().

so let's create a multi-classification CNN model (with ten ground-truth labels) containing the following layers: Conv -> ReLU -> Batchnorm -> Conv -> ReLU -> Batchnorm -> Adaptive average pooling -> dropout -> fully connected. suppose the input has only one channel and forward() will only return output of the model.

In [5]:
class Model(nn.Module):
    
    def __init__(self):
        super().__init__()
        self.conv = nn.Sequential(
                        nn.Conv2d(1, 32, 3, 1),
                        nn.ReLU(),
                        nn.BatchNorm2d(32),
                        nn.Conv2d(32, 64, 3, 1),
                        nn.ReLU(),
                        nn.BatchNorm2d(64),
                    )
        self.glob_avg_pool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Sequential(
                    nn.Dropout(0.5),
                    nn.Linear(64, 10),
                )
        
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.conv(x)
        x = self.glob_avg_pool(x)
        x = x.flatten(1)
        x = self.fc(x)
        
        return x

set model device

Previously, we have determined which device (GPU or CPU) is going to be used, although it has not been allocated yet to parameters of the model. Pytorch .to(device) Api provides this for us.

In [6]:
model = Model()

model.to(device)
Out[6]:
Model(
  (conv): Sequential(
    (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
    (1): ReLU()
    (2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))
    (4): ReLU()
    (5): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (glob_avg_pool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=64, out_features=10, bias=True)
  )
)

Model phases

there are two phases for a Pytorch model: .train() and .eval(). models are by default at .train() phase, however the difference between these two is that in eval() phase, some layers change their behavior during inference; for instance dropout will be deactivated and batch normalization will not update estimated mean and variance and they will be used only for normalization, hence please note .eval() will not block parameters to be updated. therefore during evaluation, besides model.eval() we should assure that back propagation is temporarily deactivated and this is possible by torch.no_grad(). indeed disabling the gradient calculation enables us to use bigger batch sizes as it speeds up the computation and reduces memory usage.

Data processing

Before training, we need to prepare and process our dataset which is MNIST here.

Data transformation

PIL images should first be transformed to torch tensors. torchvision.transforms.Compose provides a pipeline of transforms. in the following 'converting to tensors' is only applied.

In [7]:
transform = transforms.Compose([
    transforms.ToTensor()
])

Download data

as evaluation is not purpose of this notebook, you only need to load train set of MNIST dataset using torchvision.datasets.MNIST.

In [8]:
train = datasets.MNIST(
               root='data',
               train=True, 
               transform=transform,
               download=True)

Data loader

define train loader using torch.utils.data.DataLoader.

In [9]:
batch_size = 32

train_loader = torch.utils.data.DataLoader(
                    dataset=train,
                    batch_size=batch_size,
                    shuffle=True)

Training

here we are going to develop training process of MNIST classification.

Optimizer

define your optimizer, use torch.optim.

In [10]:
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

Procedure

implement the procedure of training in the following cell. please note evaluation is not purpose of this notebook, therefore only report the training loss changes which ought to be descending in general. consider cross entropy as loss function and compute it without using pre-defined APIs. the backpropagation consists of three sub-parts:

  1. gradient computation
  2. updating learning parameters
  3. removing current computed gradients for next iteration

fortunately we don't need to implement them from sctrach as pytorch provides APIs for them.

In [11]:
num_epochs = 3
num_iters = len(train_loader)
train_losses = np.zeros((num_epochs, num_iters), dtype=np.float32) 

for epoch in range(num_epochs):
    for it, (X, y) in enumerate(train_loader):
        X = X.to(device)
        y = y.to(device)
        x = model(X)
        
        x = F.log_softmax(x, -1)
        loss = -1 * torch.mean(x[torch.arange(x.shape[0]), y])
        train_losses[epoch, it] = loss
        
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        
        if it and it % 200 == 0:
            print(f"Epoch {epoch + 1}: average loss after {it} iteration is {train_losses[epoch, :it].mean()}", flush=True)
Epoch 1: average loss after 200 iteration is 2.157982110977173
Epoch 1: average loss after 400 iteration is 2.090942144393921
Epoch 1: average loss after 600 iteration is 2.034151077270508
Epoch 1: average loss after 800 iteration is 1.9887149333953857
Epoch 1: average loss after 1000 iteration is 1.9475260972976685
Epoch 1: average loss after 1200 iteration is 1.9116764068603516
Epoch 1: average loss after 1400 iteration is 1.8755967617034912
Epoch 1: average loss after 1600 iteration is 1.842836856842041
Epoch 1: average loss after 1800 iteration is 1.812255620956421
Epoch 2: average loss after 200 iteration is 1.4998501539230347
Epoch 2: average loss after 400 iteration is 1.4850534200668335
Epoch 2: average loss after 600 iteration is 1.4689152240753174
Epoch 2: average loss after 800 iteration is 1.451676368713379
Epoch 2: average loss after 1000 iteration is 1.4342695474624634
Epoch 2: average loss after 1200 iteration is 1.41796875
Epoch 2: average loss after 1400 iteration is 1.4022103548049927
Epoch 2: average loss after 1600 iteration is 1.3869268894195557
Epoch 2: average loss after 1800 iteration is 1.3722786903381348
Epoch 3: average loss after 200 iteration is 1.2157230377197266
Epoch 3: average loss after 400 iteration is 1.2037761211395264
Epoch 3: average loss after 600 iteration is 1.1866754293441772
Epoch 3: average loss after 800 iteration is 1.1747138500213623
Epoch 3: average loss after 1000 iteration is 1.1625813245773315
Epoch 3: average loss after 1200 iteration is 1.1456233263015747
Epoch 3: average loss after 1400 iteration is 1.131696343421936
Epoch 3: average loss after 1600 iteration is 1.1196773052215576
Epoch 3: average loss after 1800 iteration is 1.1082934141159058