PyTorch

The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at
https://en.wikibooks.org/wiki/PyTorch

Permission is granted to copy, distribute, and/or modify this document under the terms of the Creative Commons Attribution-ShareAlike 3.0 License.

Introduction

PyTorch, aka pytorch, is a package for deep learning. It can also be used for shallow learning, for optimization tasks unrelated to deep learning, and for general linear algebra calculations with or without CUDA.

PyTorch is one of many packages for deep learning. As for November 2018, it was the second after TensorFlow by number of contributors, the third after TensorFlow and Caffe by number of stars in github [1]. Keras also should be mentioned here.

PyTorch descended from the Torch package under a language called Lua. For that reason, pytorch is called torch within python. For example, import torch but conda update pytorch.

To install PyTorch, go to it's official page, [pytorch.org]. Unfortunately, you cannot install PyTorch with sudo apt install.

Advandages and disadvantages

PyTorch is simpler and easier to learn than TensorFlow, and offers more freedom. As Kirill Doubikov wrote[2],

Overall, the [PyTorch] framework is more tightly integrated with Python language and feels more native most of the times. When you write in TensorFlow sometimes you feel that your model is behind a brick wall with several tiny holes to communicate over.

On the other hand, he wrote:

Currently, TensorFlow is considered as a to-go tool by many researchers and industry professionals. The framework is well documented and if the documentation will not suffice there are many extremely well-written tutorials on the internet. You can find hundreds of implemented and trained models on github, start here.

PyTorch is relatively new compared to its competitor (and is still in beta), but it is quickly getting its momentum. Documentation and official tutorials are also nice. PyTorch also include several implementations of popular computer vision architectures which are super-easy to use.

As for September 2019, PyTorch is not beta anymore, but the difference still holds.

TensorFlow has a great visualization tool, TensorBoard. Starting from the version 1.1, PyTorch also wholly supports TensorBoard.

Tensor

The basic object in PyTorch is tensor. Tensors are similar to numpy matrices with two important additions: they work with CUDA, and they can calculate gradients.

Tensors are created and manipulated similarly to numpy matrices:

>>> a = np.random.rand(10000, 10000).astype(np.float32)
>>> b = np.random.rand(10000, 10000).astype(np.float32)
>>> t = time.time(); c = np.matmul(a, b); time.time()-t
7.447854280471802

>>> a1 = torch.rand(10000, 10000, dtype=torch.float32) # note how torch.rand supports dtype
>>> b1 = torch.rand(10000, 10000, dtype=torch.float32)
>>> t = time.time(); c1 = torch.matmul(a1, b1); time.time()-t
7.758733749389648

All function like np.ones, np.zeros, np.empty and so on, as well as other main functions and arythmeric operators, also present in torch:

   >>> torch.ones(2,2)
   tensor([[1., 1.],
           [1., 1.]])
   >>> torch.ones(2,2, dtype=torch.int32)
   tensor([[1, 1],
           [1, 1]], dtype=torch.int32)
   >>> a=torch.ones(2,2) # or torch.ones((2,2)) which is the same
   >>> b=a+1
   >>> c=a*b
   >>> c.reshape(1,4) # or c.view(1,4) which is the same
   tensor(2., 2., 2., 2.)

For tensors, the function size is a function which returns torch.Size object, rather then a member which is a tuple. It is good, because torch.Size inherits tuple and has some additional operators defined:

>>> a=torch.ones(2,3,4)
>>> a.size()
torch.Size([2, 3, 4])
>>> a.size().numel()
24

The functions sum(), mean() and so on for tensors return not a number but a zero dimensional tensor. Tensor elements are also zero dimensional tensors rather than numbers:

   >>> a = torch.ones(2,2)
   >>> a.sum()
   tensor(4.)
   >>> a.sum().size()
   torch.Size([])
   >>> a.sum().dim() 
   0
   >>> a[0,0]
   tensor(1.)

To convert a zero dimensional tensor to a number, you should explicitly call the function item:
   >>> a.sum().item()
   4.0

Instead of numpy's astype, in torch there is a function to

   >>> a.to(torch.int16)
   tensor([[1, 1],
           [1, 1]], dtype=torch.int16)

The name is changed because the function to can do more than just change element types. It can also move data to and from CUDA, and it works for the wide range of torch datatypes, including neural networks.

Tensors and numpy matrices

Since tensors and numpy matrices are so similar, it would be nice if we could convert them to each other. And we, indeed, can. It is as easy as cake. To convert tensor to matrix, just call numpy method. For the opposite, call torch.tensor constructor:

   >>> a=torch.ones(2,2, dtype=torch.float16)
   >>> a.numpy()
   array([[1., 1.],
          [1., 1.]], dtype=float16)
   >>> b=np.ones((2,2), dtype=np.float16)
   >>> torch.tensor(b)
   tensor([[1., 1.],
           [1., 1.]], dtype=torch.float16)

CUDA

While you can use PyTorch without CUDA, it accelerates the computations by a factor of 10-20.

Before using CUDA, check whether it is available. Type:

   torch.cuda.is_available()

If it returned False, you may skip the rest of this section.

You may also check the versions of CUDA and cuDNN library:

   >>> torch.version.cuda
   '10.0'
   >>> torch.backends.cudnn.version()
   7401
   >>> torch.backends.cudnn.enabled
   True

Unlike numpy, tensors can be easily moved to and from CUDA memory. In CUDA, you can do almost whatever you can do out of it. If your computer is equipped with CUDA, and you installed the driver (NVIDIA CUDA 10.0 or higher), you can do the following:

cuda = torch.device('cuda')
a = torch.randn(10000, 10000, device=cuda)
b = torch.randn(10000, 10000, device=cuda)
t = time.time(); c = torch.matmul(a, b); print(time.time()-t)

On my computer, the time was 0.4 seconds, which is $2.5\times 10^{12}$ multiplications per second.

You can easily move tensors to and from CUDA memory with to method

>>> cuda = torch.device('cuda')
>>> cpu = torch.device('cpu')
>>> a = torch.ones(5,5)
>>> b = a.to(cuda) # move to cuda
>>> c = b.to(cpu) # move back to cpu
>>> a.device
device(type='cpu')
>>> b.device
device(type='cuda')
>>> c.device
device(type='cpu')

You cannot mix CUDA and CPU tensors in your expressions:

>>> a+b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: expected backend CPU and dtype Float but got backend CUDA and dtype Float

Autograd

The autograd module implemented into PyTorch makes calculating gradients via backpropagation a piece of cake. You need to specify the requires_grad parameter ("requires" with -s, "grad" without), and call backward method.

>>> a=torch.ones(2,2, requires_grad=True)
>>> b=torch.eye(2,2, requires_grad=True)
>>> c = a*a*(b+1)
>>> d=c.sum() 
>>> d.backward() # calculate gradients
>>> a.grad # gradient of d with respect to a
tensor([[4., 2.],
        [2., 4.]])
>>> b.grad # gradient of d with respect to b
tensor([[1., 1.],
        [1., 1.]])

PyTorch/Printable version

Contents

Introduction

Introduction

Advandages and disadvantages

Tensor

Tensor

Tensors and numpy matrices

CUDA

Autograd

In-place operators