Introduction to Pytorch: A solution
an Introduction of how to use Pytorch
import numpy as np
import torch
from torch import nn
from torch.nn import functional as F
from torchvision import transforms, datasets
NOTE: it is recommended to watch this link about "Intoduction of how to code in Pytorch" instructed by Rassa Ghavami beforehand.
tensor is mostly same as numpy array (even its applications like broadcasting operation, indexing, slicing and etc), except for it brings us the opportunity to run operations on faster hardwares like GPU. let's see some tensor defintion
arr = torch.zeros((256, 256), dtype=torch.int32)
# tensors are defined by default at CPU
print(arr.device)
# keep 'size', 'dtype' and 'device' same as arr, but fill with 1
arr2 = torch.ones_like(arr)
# keep 'dtype' and 'device' same as arr, but fill data arbitrarily
arr3 = arr.new_tensor([[1, 2], [3, 4]])
in order to feed tensors to deep-learning models, they should follow a customary shape form; B C H W
for 4D tensors where B
is batch size, C
is channel dimension and H W
are spatial dimensions.
first we need to determine which device all torch tensors (including the input, learning weights and etc) are going to be allocated. basically, GPU is the first priority.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
it is often recommended to generate pseudo random numbers as it provides fair comparison between different configs of deep learning model(s). torch provides this by torch.manual_seed
.
np.random.seed(12345)
# same seed on all devices; both CPU and CUDA
torch.manual_seed(12345)
from now on, you will learn how to build and train a CNN model.
pytorch models are defined as python classes inherited from torch.nn.Module
. two functions are essential for model creation:
__init__()
.forward()
.so let's create a multi-classification CNN model (with ten ground-truth labels) containing the following layers: Conv
-> ReLU
-> Batchnorm
-> Conv
-> ReLU
-> Batchnorm
-> Adaptive average pooling
-> dropout
-> fully connected
. suppose the input has only one channel and forward()
will only return output of the model.
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv = nn.Sequential(
nn.Conv2d(1, 32, 3, 1),
nn.ReLU(),
nn.BatchNorm2d(32),
nn.Conv2d(32, 64, 3, 1),
nn.ReLU(),
nn.BatchNorm2d(64),
)
self.glob_avg_pool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Sequential(
nn.Dropout(0.5),
nn.Linear(64, 10),
)
def forward(self, x: torch.Tensor) -> torch.Tensor:
x = self.conv(x)
x = self.glob_avg_pool(x)
x = x.flatten(1)
x = self.fc(x)
return x
Previously, we have determined which device (GPU or CPU) is going to be used, although it has not been allocated yet to parameters of the model. Pytorch .to(device)
Api provides this for us.
model = Model()
model.to(device)
there are two phases for a Pytorch model: .train()
and .eval()
. models are by default at .train()
phase, however the difference between these two is that in eval()
phase, some layers change their behavior during inference; for instance dropout will be deactivated and batch normalization will not update estimated mean and variance and they will be used only for normalization, hence please note .eval()
will not block parameters to be updated. therefore during evaluation, besides model.eval()
we should assure that back propagation is temporarily deactivated and this is possible by torch.no_grad()
. indeed disabling the gradient calculation enables us to use bigger batch sizes as it speeds up the computation and reduces memory usage.
Before training, we need to prepare and process our dataset which is MNIST here.
PIL images should first be transformed to torch tensors. torchvision.transforms.Compose
provides a pipeline of transforms. in the following 'converting to tensors' is only applied.
transform = transforms.Compose([
transforms.ToTensor()
])
as evaluation is not purpose of this notebook, you only need to load train set of MNIST dataset using torchvision.datasets.MNIST
.
train = datasets.MNIST(
root='data',
train=True,
transform=transform,
download=True)
define train loader using torch.utils.data.DataLoader
.
batch_size = 32
train_loader = torch.utils.data.DataLoader(
dataset=train,
batch_size=batch_size,
shuffle=True)
here we are going to develop training process of MNIST classification.
define your optimizer, use torch.optim
.
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
implement the procedure of training in the following cell. please note evaluation is not purpose of this notebook, therefore only report the training loss changes which ought to be descending in general. consider cross entropy as loss function and compute it without using pre-defined APIs. the backpropagation consists of three sub-parts:
fortunately we don't need to implement them from sctrach as pytorch provides APIs for them.
num_epochs = 3
num_iters = len(train_loader)
train_losses = np.zeros((num_epochs, num_iters), dtype=np.float32)
for epoch in range(num_epochs):
for it, (X, y) in enumerate(train_loader):
X = X.to(device)
y = y.to(device)
x = model(X)
x = F.log_softmax(x, -1)
loss = -1 * torch.mean(x[torch.arange(x.shape[0]), y])
train_losses[epoch, it] = loss
loss.backward()
optimizer.step()
optimizer.zero_grad()
if it and it % 200 == 0:
print(f"Epoch {epoch + 1}: average loss after {it} iteration is {train_losses[epoch, :it].mean()}", flush=True)