Personal blog written from scratch using Node.js, Bootstrap, and MySQL. https://jrtechs.net

491 lines
12 KiB

  1. This post dives headfirst into PyTorch: a powerful open-source ML library for Python.
  2. Google's flagship ML library is TensorFlow, whereas Facebook developed PyTorch.
  3. Researchers are gravitating towards PyTorch due to its flexibility and efficiency. This tutorial goes over the basics of PyTorch, including tensors and a simple perceptron.
  4. This tutorial requires knowledge of Python, Numpy, and neural networks.
  5. If you don't know anything about neural networks, I suggest that you watch this amazing video by 3Blue1Brown:
  6. <youtube src="aircAruvnKk" />
  7. You can install both Numpy and PyTorch using pip. Check out the [PyTorch](https://pytorch.org/) website to get the version of PyTorch that works with your version of CUDA installed on your computer. If you don't have an NVIDIA GPU with CUDA, you can still run PyTorch, but you won't be able to run your programs on the GPU.
  8. ```python
  9. !pip install numpy
  10. !pip install torch torchvision
  11. ```
  12. Pro tip: if you are in a notebook adding a "!" will execute your command on the terminal.
  13. ```python
  14. !pwd
  15. ```
  16. ```
  17. /home/jeff/Documents/python
  18. ```
  19. # Tensors
  20. The core of the PyTorch library is centered around Tensors. Tensors are analogous to Numpy matrices, however, the benefit of tensors is their ability to get placed on the GPU. Tensors also allow you to do auto gradients, which makes doing backpropagation in neural networks a lot faster.
  21. Creating an empty tensor is similar to creating a new C array: anything can be in the memory that you grabbed, so don't expect it to be zero, ones, or "random."
  22. ```python
  23. import torch
  24. torch.empty(5, 2)
  25. ```
  26. ```
  27. tensor([[4.8132e-36, 4.5597e-41],
  28. [1.4906e-11, 3.0957e-41],
  29. [4.4842e-44, 0.0000e+00],
  30. [8.9683e-44, 0.0000e+00],
  31. [7.7759e-13, 3.0957e-41]])
  32. ```
  33. If you explicitly create a random matrix, you will get values between zero and one.
  34. ```python
  35. x = torch.rand(5, 3)
  36. print(x)
  37. print(x.shape)
  38. ```
  39. ```
  40. tensor([[0.7825, 0.7864, 0.1257],
  41. [0.7588, 0.6572, 0.9262],
  42. [0.4881, 0.6329, 0.3424],
  43. [0.1333, 0.4235, 0.6760],
  44. [0.9737, 0.6657, 0.9946]])
  45. torch.Size([5, 3])
  46. ```
  47. Similarly, there is a function for random integers.
  48. ```python
  49. torch.randint(low=0, high=5, size=(3,3))
  50. ```
  51. ```
  52. tensor([[4, 2, 1],
  53. [2, 0, 3],
  54. [2, 2, 2]])
  55. ```
  56. ```python
  57. torch.ones(3,1)
  58. ```
  59. ```
  60. tensor([[1.],
  61. [1.],
  62. [1.]])
  63. ```
  64. Similar to numpy, you can also specify an empty array filled with zeros and specify a data type.
  65. Common data types:
  66. - torch.long
  67. - torch.bool
  68. - torch.float
  69. - torch.int
  70. - torch.int8
  71. - torch.int16
  72. - torch.int32
  73. - torch.int64
  74. - torch.double
  75. ```python
  76. x = torch.zeros(5, 2, dtype=torch.long)
  77. print(x)
  78. ```
  79. ```
  80. tensor([[0, 0],
  81. [0, 0],
  82. [0, 0],
  83. [0, 0],
  84. [0, 0]])
  85. ```
  86. Size returns a tuple. In PyTorch it is common to do .size(), however .shape will return the same thing.
  87. ```python
  88. print(x.size())
  89. print(x.shape)
  90. ```
  91. ```
  92. torch.Size([5, 2])
  93. torch.Size([5, 2])
  94. ```
  95. Like Numpy, Pytorch supports a ton of operators on a Tensor.
  96. Check out the documentation at the [official website](https://pytorch.org/docs/stable/tensors.html).
  97. ```python
  98. y = torch.rand(5,2)
  99. x + y # same as torch.add(x, y)
  100. result = torch.add(x, y)
  101. torch.add(x, y, out= result)
  102. ```
  103. ```
  104. tensor([[0.4942, 0.7370],
  105. [0.9927, 0.7068],
  106. [0.1702, 0.9578],
  107. [0.6510, 0.4992],
  108. [0.2482, 0.4928]])
  109. ```
  110. Pytorch added multiple functions with "_" for standard operators that operate on the calling tensor.
  111. ```python
  112. # adds the result to y
  113. y.add_(result)
  114. ```
  115. ```
  116. tensor([[0.9885, 1.4740],
  117. [1.9855, 1.4135],
  118. [0.3405, 1.9155],
  119. [1.3020, 0.9984],
  120. [0.4964, 0.9856]])
  121. ```
  122. Tensors support pythonic ways of accessing each element.
  123. ```python
  124. print(y[0][0]) # first element as a tensor
  125. print(y[0, 0]) # same thing as y[0][0]
  126. print(y[0][0].item()) # grabs data inside tensor
  127. print(y[:, 0]) # gets first col
  128. print(y[1, :]) # gets second row
  129. ```
  130. ```
  131. tensor(0.9885)
  132. tensor(0.9885)
  133. 0.98846435546875
  134. tensor([0.9885, 1.9855, 0.3405, 1.3020, 0.4964])
  135. tensor([1.9855, 1.4135])
  136. ```
  137. You can resize the tensor using the view function.
  138. ```python
  139. print(y.view(1,10))
  140. print(y.view(2,5))
  141. ```
  142. ```
  143. tensor([[0.9885, 1.4740, 1.9855, 1.4135, 0.3405, 1.9155, 1.3020, 0.9984, 0.4964,
  144. 0.9856]])
  145. tensor([[0.9885, 1.4740, 1.9855, 1.4135, 0.3405],
  146. [1.9155, 1.3020, 0.9984, 0.4964, 0.9856]])
  147. ```
  148. ## CUDA
  149. One of the great things about PyTorch is that you can run everything on either the GPU or the CPU. To make code more flexible to run on either device, most people set the device dynamically. Keeping your devices consistent is crucial because you can't do operations to a "CUDA" tensor by a "CPU" tensor-- this makes sense because one is on the GPU's memory, where the other is in the computer's main memory -- RAM.
  150. ```python
  151. torch.cuda.is_available() # prints if CUDA is available on system
  152. ```
  153. ```
  154. True
  155. ```
  156. ```python
  157. device = torch.device("cpu")
  158. if torch.cuda.is_available():
  159. device = torch.device("cuda")
  160. x.to(device) # puts the x matrix on device selected
  161. ```
  162. ```
  163. tensor([[0.7825, 0.7864, 0.1257],
  164. [0.7588, 0.6572, 0.9262],
  165. [0.4881, 0.6329, 0.3424],
  166. [0.1333, 0.4235, 0.6760],
  167. [0.9737, 0.6657, 0.9946]], device='cuda:0')
  168. ```
  169. ## NumPy to Tensor
  170. It is possible to switch between Numpy arrays and Tensors. Note, that this is now a shadow reference. Anything done to the NumPy array will get reflected in the original tensor and vice versa.
  171. ```python
  172. import numpy as np
  173. g = np.zeros(5)
  174. gg = torch.from_numpy(g)
  175. print(gg)
  176. ```
  177. ```
  178. tensor([0., 0., 0., 0., 0.], dtype=torch.float64)
  179. ```
  180. ## CUDA Performance
  181. Without question, the performance of matrix operations on the GPU is lightyears faster than on the CPU. The following code is an example of the speed difference.
  182. ```python
  183. import time # times in seconds
  184. def time_torch(size):
  185. x = torch.rand(size, size, device=torch.device("cuda"))
  186. start = time.time()
  187. x.sin_()
  188. end = time.time()
  189. return(end - start)
  190. def time_numpy(size):
  191. x = np.random.rand(size, size)
  192. start = time.time()
  193. np.sin(x, out=x)
  194. end = time.time()
  195. return(end - start)
  196. print(time_numpy(10000))
  197. print(time_torch(10000))
  198. ```
  199. ```
  200. 1.8906972408294678
  201. 0.003466367721557617
  202. ```
  203. On the CPU, it took 1.9 seconds to take the sin of a 10k by 10k matrix, on my GPU (Nvidia 1060), it only took 0.003 seconds!
  204. It is worth pointing out that there is some overhead when transferring data from the GPU's memory to the main memory.
  205. For this reason, when designing algorithms, you should avoid swapping data on and off the GPU.
  206. # Basic Perceptron
  207. Now that we have seen Tensors, we can look at a basic neural network example.
  208. In this example, we are merely going to be doing linear regression.
  209. IE: our algorithm takes in a single input and tries to predict the output using the equation:
  210. $$
  211. y = mx+b
  212. $$
  213. The "x" is our input, and the "m" is the weight, and the b is the bias.
  214. ```python
  215. import torch
  216. from torch.autograd import Variable
  217. import torch.nn as nn
  218. import torch.nn.functional as F
  219. ```
  220. There is a lot to be said about how you define a neural network in PyTorch; however, most follow this basic example.
  221. The constructor creates each layer, and the forward function defines how data gets calculated as it gets pushed through the network.
  222. ```python
  223. class Net(nn.Module):
  224. def __init__(self):
  225. super(Net, self).__init__()
  226. self.fc1 = nn.Linear(1,1)
  227. def forward(self, x):
  228. x = self.fc1(x)
  229. return x
  230. ```
  231. ```python
  232. net = Net()
  233. net.cuda() # puts the NN on the GPU
  234. print(net) # displays NN structure
  235. ```
  236. ```
  237. Net(
  238. (fc1): Linear(in_features=1, out_features=1, bias=True)
  239. )
  240. ```
  241. Printing the network like this is useful because you can see the dimensions of the data going in and out of the neural net.
  242. ```python
  243. print(list(net.parameters()))
  244. ```
  245. ```
  246. [Parameter containing:
  247. tensor([[0.2431]], device='cuda:0', requires_grad=True), Parameter containing:
  248. tensor([0.3372], device='cuda:0', requires_grad=True)]
  249. ```
  250. As to be expected, we see that the NN has two parameters initialized to random values.
  251. Next, we have to define our loss function before we can train.
  252. Our loss measures the magnitude of how incorrect our prediction made by the network was.
  253. For this example, we are using the squared difference; this can be changed to other loss functions.
  254. The important thing is that the loss is positive so that we can do backpropagation on it.
  255. ```python
  256. def criterion(out, label):
  257. return (label - out)**2
  258. ```
  259. An optimizer is a special object by Pytorch that will adjust the neural network weights based on how they affected the gradient of the error.
  260. ```python
  261. import torch.optim as optim
  262. optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.5)
  263. ```
  264. The dummy data simply follows the equation y = 3x + 0.
  265. ```python
  266. data = [(1,3), (2,6), (3,9), (4,12), (5,15), (6,18)]
  267. ```
  268. All training loops look something like the following example. An epoch is simply how many iterations the neural network looks at the data. Optionally people will include another outer loop for "runs" this will simply run this training process multiple times to see if we are always converging on the same answer or are we getting stuck at local minimums.
  269. ```python
  270. for epoch in range(100):
  271. for i, data2 in enumerate(data):
  272. X, Y = iter(data2)
  273. X, Y = Variable(torch.FloatTensor([X]), requires_grad=True).cuda(), Variable(torch.FloatTensor([Y])).cuda()
  274. optimizer.zero_grad()
  275. outputs = net(X)
  276. loss = criterion(outputs, Y)
  277. loss.backward()
  278. optimizer.step()
  279. if (i % 10 == 0):
  280. print("Epoch {} - loss: {}".format(epoch, loss.data[0]))
  281. ```
  282. ```
  283. Epoch 0 - loss: 5.854729652404785
  284. Epoch 1 - loss: 2.294259548187256
  285. Epoch 2 - loss: 0.5001814961433411
  286. Epoch 3 - loss: 0.8155164122581482
  287. Epoch 4 - loss: 0.6028059720993042
  288. ...
  289. Epoch 95 - loss: 4.468947372515686e-05
  290. Epoch 96 - loss: 4.02352525270544e-05
  291. Epoch 97 - loss: 3.622113945311867e-05
  292. Epoch 98 - loss: 3.2605526939732954e-05
  293. Epoch 99 - loss: 2.934764779638499e-05
  294. ```
  295. ```python
  296. print(list(net.parameters()))
  297. ```
  298. ```
  299. [Parameter containing:
  300. tensor([[2.9989]], device='cuda:0', requires_grad=True), Parameter containing:
  301. tensor([0.0063], device='cuda:0', requires_grad=True)]
  302. ```
  303. We can see that our NN has extimated the data generated by "y= 3x" to be "y= 2.999x + 0.006".
  304. We can now use this network to make predictions.
  305. Note: the shape of the input has to comply with the forward function, and the device of your input tensor must be the same device that the network is on.
  306. ```python
  307. input = Variable(torch.ones(1,1,1).cuda())
  308. print(input)
  309. print(net(input))
  310. ```
  311. ```
  312. tensor([[[1.]]], device='cuda:0')
  313. tensor([[[3.0051]]], device='cuda:0', grad_fn=<AddBackward0>)
  314. ```
  315. From this example, we can quickly create more intricate neural networks. By adding a few lines of code, we can create a multi-layer perceptron.
  316. ```python
  317. class Net(nn.Module):
  318. def __init__(self):
  319. super(Net, self).__init__()
  320. self.fc1 = nn.Linear(1,10)
  321. self.fc2 = nn.Linear(10,1)
  322. def forward(self, x):
  323. x = self.fc2(self.fc1(x))
  324. return x
  325. ```
  326. ```python
  327. net = Net().cuda()
  328. net
  329. ```
  330. ```
  331. Net(
  332. (fc1): Linear(in_features=1, out_features=10, bias=True)
  333. (fc2): Linear(in_features=10, out_features=1, bias=True)
  334. )
  335. ```
  336. ```python
  337. criterion = nn.MSELoss() # mean squared error for loss function
  338. optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.01)
  339. for epoch in range(100):
  340. for i, data2 in enumerate(data):
  341. X, Y = iter(data2)
  342. X, Y = Variable(torch.FloatTensor([X]), requires_grad=True).cuda(), Variable(torch.FloatTensor([Y])).cuda()
  343. optimizer.zero_grad()
  344. outputs = net(X)
  345. loss = criterion(outputs, Y)
  346. loss.backward()
  347. optimizer.step()
  348. if (i % 10 == 0):
  349. print("Epoch {} - loss: {}".format(epoch, loss.data))
  350. ```
  351. ```
  352. Epoch 0 - loss: 7.190098285675049
  353. Epoch 1 - loss: 1.51701192407927e-06
  354. Epoch 2 - loss: 0.1253584325313568
  355. Epoch 3 - loss: 0.5402220487594604
  356. Epoch 4 - loss: 1.1704645156860352
  357. ...
  358. Epoch 95 - loss: 3.3565470403118525e-09
  359. Epoch 96 - loss: 5.913989298278466e-10
  360. Epoch 97 - loss: 1.8417267710901797e-11
  361. Epoch 98 - loss: 2.3283064365386963e-10
  362. Epoch 99 - loss: 7.130438461899757e-10
  363. ```