This post dives headfirst into PyTorch: a powerful open-source ML library for Python.
Google's flagship ML library is TensorFlow, whereas Facebook developed PyTorch.
Researchers are gravitating towards PyTorch due to its flexibility and efficiency. This tutorial goes over the basics of PyTorch, including tensors and a simple perceptron.
This tutorial requires knowledge of Python, Numpy, and neural networks.
If you don't know anything about neural networks, I suggest that you watch this amazing video by 3Blue1Brown:
You can install both Numpy and PyTorch using pip. Check out the [PyTorch](https://pytorch.org/) website to get the version of PyTorch that works with your version of CUDA installed on your computer. If you don't have an NVIDIA GPU with CUDA, you can still run PyTorch, but you won't be able to run your programs on the GPU.
```python
!pip install numpy
!pip install torch torchvision
```
Pro tip: if you are in a notebook adding a "!" will execute your command on the terminal.
```python
!pwd
```
```
/home/jeff/Documents/python
```
# Tensors
The core of the PyTorch library is centered around Tensors. Tensors are analogous to Numpy matrices, however, the benefit of tensors is their ability to get placed on the GPU. Tensors also allow you to do auto gradients, which makes doing backpropagation in neural networks a lot faster.
Creating an empty tensor is similar to creating a new C array: anything can be in the memory that you grabbed, so don't expect it to be zero, ones, or "random."
```python
import torch
torch.empty(5, 2)
```
```
tensor([[4.8132e-36, 4.5597e-41],
[1.4906e-11, 3.0957e-41],
[4.4842e-44, 0.0000e+00],
[8.9683e-44, 0.0000e+00],
[7.7759e-13, 3.0957e-41]])
```
If you explicitly create a random matrix, you will get values between zero and one.
```python
x = torch.rand(5, 3)
print(x)
print(x.shape)
```
```
tensor([[0.7825, 0.7864, 0.1257],
[0.7588, 0.6572, 0.9262],
[0.4881, 0.6329, 0.3424],
[0.1333, 0.4235, 0.6760],
[0.9737, 0.6657, 0.9946]])
torch.Size([5, 3])
```
Similarly, there is a function for random integers.
```python
torch.randint(low=0, high=5, size=(3,3))
```
```
tensor([[4, 2, 1],
[2, 0, 3],
[2, 2, 2]])
```
```python
torch.ones(3,1)
```
```
tensor([[1.],
[1.],
[1.]])
```
Similar to numpy, you can also specify an empty array filled with zeros and specify a data type.
Common data types:
- torch.long
- torch.bool
- torch.float
- torch.int
- torch.int8
- torch.int16
- torch.int32
- torch.int64
- torch.double
```python
x = torch.zeros(5, 2, dtype=torch.long)
print(x)
```
```
tensor([[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0]])
```
Size returns a tuple. In PyTorch it is common to do .size(), however .shape will return the same thing.
```python
print(x.size())
print(x.shape)
```
```
torch.Size([5, 2])
torch.Size([5, 2])
```
Like Numpy, Pytorch supports a ton of operators on a Tensor.
Check out the documentation at the [official website](https://pytorch.org/docs/stable/tensors.html).
```python
y = torch.rand(5,2)
x + y # same as torch.add(x, y)
result = torch.add(x, y)
torch.add(x, y, out= result)
```
```
tensor([[0.4942, 0.7370],
[0.9927, 0.7068],
[0.1702, 0.9578],
[0.6510, 0.4992],
[0.2482, 0.4928]])
```
Pytorch added multiple functions with "_" for standard operators that operate on the calling tensor.
```python
# adds the result to y
y.add_(result)
```
```
tensor([[0.9885, 1.4740],
[1.9855, 1.4135],
[0.3405, 1.9155],
[1.3020, 0.9984],
[0.4964, 0.9856]])
```
Tensors support pythonic ways of accessing each element.
```python
print(y[0][0]) # first element as a tensor
print(y[0, 0]) # same thing as y[0][0]
print(y[0][0].item()) # grabs data inside tensor
print(y[:, 0]) # gets first col
print(y[1, :]) # gets second row
```
```
tensor(0.9885)
tensor(0.9885)
0.98846435546875
tensor([0.9885, 1.9855, 0.3405, 1.3020, 0.4964])
tensor([1.9855, 1.4135])
```
You can resize the tensor using the view function.
```python
print(y.view(1,10))
print(y.view(2,5))
```
```
tensor([[0.9885, 1.4740, 1.9855, 1.4135, 0.3405, 1.9155, 1.3020, 0.9984, 0.4964,
0.9856]])
tensor([[0.9885, 1.4740, 1.9855, 1.4135, 0.3405],
[1.9155, 1.3020, 0.9984, 0.4964, 0.9856]])
```
## CUDA
One of the great things about PyTorch is that you can run everything on either the GPU or the CPU. To make code more flexible to run on either device, most people set the device dynamically. Keeping your devices consistent is crucial because you can't do operations to a "CUDA" tensor by a "CPU" tensor-- this makes sense because one is on the GPU's memory, where the other is in the computer's main memory -- RAM.
```python
torch.cuda.is_available() # prints if CUDA is available on system
```
```
True
```
```python
device = torch.device("cpu")
if torch.cuda.is_available():
device = torch.device("cuda")
x.to(device) # puts the x matrix on device selected
```
```
tensor([[0.7825, 0.7864, 0.1257],
[0.7588, 0.6572, 0.9262],
[0.4881, 0.6329, 0.3424],
[0.1333, 0.4235, 0.6760],
[0.9737, 0.6657, 0.9946]], device='cuda:0')
```
## NumPy to Tensor
It is possible to switch between Numpy arrays and Tensors. Note, that this is now a shadow reference. Anything done to the NumPy array will get reflected in the original tensor and vice versa.
```python
import numpy as np
g = np.zeros(5)
gg = torch.from_numpy(g)
print(gg)
```
```
tensor([0., 0., 0., 0., 0.], dtype=torch.float64)
```
## CUDA Performance
Without question, the performance of matrix operations on the GPU is lightyears faster than on the CPU. The following code is an example of the speed difference.
```python
import time # times in seconds
def time_torch(size):
x = torch.rand(size, size, device=torch.device("cuda"))
start = time.time()
x.sin_()
end = time.time()
return(end - start)
def time_numpy(size):
x = np.random.rand(size, size)
start = time.time()
np.sin(x, out=x)
end = time.time()
return(end - start)
print(time_numpy(10000))
print(time_torch(10000))
```
```
1.8906972408294678
0.003466367721557617
```
On the CPU, it took 1.9 seconds to take the sin of a 10k by 10k matrix, on my GPU (Nvidia 1060), it only took 0.003 seconds!
It is worth pointing out that there is some overhead when transferring data from the GPU's memory to the main memory.
For this reason, when designing algorithms, you should avoid swapping data on and off the GPU.
# Basic Perceptron
Now that we have seen Tensors, we can look at a basic neural network example.
In this example, we are merely going to be doing linear regression.
IE: our algorithm takes in a single input and tries to predict the output using the equation:
$$
y = mx+b
$$
The "x" is our input, and the "m" is the weight, and the b is the bias.
```python
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
```
There is a lot to be said about how you define a neural network in PyTorch; however, most follow this basic example.
The constructor creates each layer, and the forward function defines how data gets calculated as it gets pushed through the network.
```python
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(1,1)
def forward(self, x):
x = self.fc1(x)
return x
```
```python
net = Net()
net.cuda() # puts the NN on the GPU
print(net) # displays NN structure
```
```
Net(
(fc1): Linear(in_features=1, out_features=1, bias=True)
)
```
Printing the network like this is useful because you can see the dimensions of the data going in and out of the neural net.
```python
print(list(net.parameters()))
```
```
[Parameter containing:
tensor([[0.2431]], device='cuda:0', requires_grad=True), Parameter containing:
tensor([0.3372], device='cuda:0', requires_grad=True)]
```
As to be expected, we see that the NN has two parameters initialized to random values.
Next, we have to define our loss function before we can train.
Our loss measures the magnitude of how incorrect our prediction made by the network was.
For this example, we are using the squared difference; this can be changed to other loss functions.
The important thing is that the loss is positive so that we can do backpropagation on it.
```python
def criterion(out, label):
return (label - out)**2
```
An optimizer is a special object by Pytorch that will adjust the neural network weights based on how they affected the gradient of the error.
```python
import torch.optim as optim
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.5)
```
The dummy data simply follows the equation y = 3x + 0.
```python
data = [(1,3), (2,6), (3,9), (4,12), (5,15), (6,18)]
```
All training loops look something like the following example. An epoch is simply how many iterations the neural network looks at the data. Optionally people will include another outer loop for "runs" this will simply run this training process multiple times to see if we are always converging on the same answer or are we getting stuck at local minimums.
```python
for epoch in range(100):
for i, data2 in enumerate(data):
X, Y = iter(data2)
X, Y = Variable(torch.FloatTensor([X]), requires_grad=True).cuda(), Variable(torch.FloatTensor([Y])).cuda()
optimizer.zero_grad()
outputs = net(X)
loss = criterion(outputs, Y)
loss.backward()
optimizer.step()
if (i % 10 == 0):
print("Epoch {} - loss: {}".format(epoch, loss.data[0]))
```
```
Epoch 0 - loss: 5.854729652404785
Epoch 1 - loss: 2.294259548187256
Epoch 2 - loss: 0.5001814961433411
Epoch 3 - loss: 0.8155164122581482
Epoch 4 - loss: 0.6028059720993042
...
Epoch 95 - loss: 4.468947372515686e-05
Epoch 96 - loss: 4.02352525270544e-05
Epoch 97 - loss: 3.622113945311867e-05
Epoch 98 - loss: 3.2605526939732954e-05
Epoch 99 - loss: 2.934764779638499e-05
```
```python
print(list(net.parameters()))
```
```
[Parameter containing:
tensor([[2.9989]], device='cuda:0', requires_grad=True), Parameter containing:
tensor([0.0063], device='cuda:0', requires_grad=True)]
```
We can see that our NN has extimated the data generated by "y= 3x" to be "y= 2.999x + 0.006".
We can now use this network to make predictions.
Note: the shape of the input has to comply with the forward function, and the device of your input tensor must be the same device that the network is on.
```python
input = Variable(torch.ones(1,1,1).cuda())
print(input)
print(net(input))
```
```
tensor([[[1.]]], device='cuda:0')
tensor([[[3.0051]]], device='cuda:0', grad_fn=)
```
From this example, we can quickly create more intricate neural networks. By adding a few lines of code, we can create a multi-layer perceptron.
```python
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(1,10)
self.fc2 = nn.Linear(10,1)
def forward(self, x):
x = self.fc2(self.fc1(x))
return x
```
```python
net = Net().cuda()
net
```
```
Net(
(fc1): Linear(in_features=1, out_features=10, bias=True)
(fc2): Linear(in_features=10, out_features=1, bias=True)
)
```
```python
criterion = nn.MSELoss() # mean squared error for loss function
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.01)
for epoch in range(100):
for i, data2 in enumerate(data):
X, Y = iter(data2)
X, Y = Variable(torch.FloatTensor([X]), requires_grad=True).cuda(), Variable(torch.FloatTensor([Y])).cuda()
optimizer.zero_grad()
outputs = net(X)
loss = criterion(outputs, Y)
loss.backward()
optimizer.step()
if (i % 10 == 0):
print("Epoch {} - loss: {}".format(epoch, loss.data))
```
```
Epoch 0 - loss: 7.190098285675049
Epoch 1 - loss: 1.51701192407927e-06
Epoch 2 - loss: 0.1253584325313568
Epoch 3 - loss: 0.5402220487594604
Epoch 4 - loss: 1.1704645156860352
...
Epoch 95 - loss: 3.3565470403118525e-09
Epoch 96 - loss: 5.913989298278466e-10
Epoch 97 - loss: 1.8417267710901797e-11
Epoch 98 - loss: 2.3283064365386963e-10
Epoch 99 - loss: 7.130438461899757e-10
```