Open main menu


PyTorch, aka pytorch, is a package for deep learning. It can also be used for shallow learning, for optimization tasks unrelated to deep learning, and for general linear algebra calculations with or without CUDA.

PyTorch is one of many packages for deep learning. As for November 2018, it was the second after TensorFlow by number of contributors, the third after TensorFlow and Caffe by number of stars in github [1]. Keras also should be mentioned here.

PyTorch descended from the Torch package under a language called Lua. For that reason, pytorch is called torch within python. For example, import torch but conda update pytorch.

To install PyTorch, go to it's official page, []. Unfortunately, you cannot install PyTorch with sudo apt install.

Advandages and disadvantagesEdit

PyTorch is simpler and easier to learn than TensorFlow, and offers more freedom. As Kirill Doubikov wrote[2],

Overall, the [PyTorch] framework is more tightly integrated with Python language and feels more native most of the times. When you write in TensorFlow sometimes you feel that your model is behind a brick wall with several tiny holes to communicate over.

On the other hand, he wrote:

Currently, TensorFlow is considered as a to-go tool by many researchers and industry professionals. The framework is well documented and if the documentation will not suffice there are many extremely well-written tutorials on the internet. You can find hundreds of implemented and trained models on github, start here.

PyTorch is relatively new compared to its competitor (and is still in beta), but it is quickly getting its momentum. Documentation and official tutorials are also nice. PyTorch also include several implementations of popular computer vision architectures which are super-easy to use.

As for September 2019, PyTorch is not beta anymore, but the difference still holds.

TensorFlow has a great visualization tool, TensorBoard. Starting from the version 1.1, PyTorch also wholly supports TensorBoard.


The basic object in PyTorch is tensor. Tensors are similar to numpy matrices with two important additions: they work with CUDA, and they can calculate gradients.

Tensors are created and manipulated similarly to numpy matrices:

>>> a = np.random.rand(10000, 10000).astype(np.float32)
>>> b = np.random.rand(10000, 10000).astype(np.float32)
>>> t = time.time(); c = np.matmul(a, b); time.time()-t
>>> a1 = torch.rand(10000, 10000, dtype=torch.float32) # note how torch.rand supports dtype
>>> b1 = torch.rand(10000, 10000, dtype=torch.float32)
>>> t = time.time(); c1 = torch.matmul(a1, b1); time.time()-t

All function like np.ones, np.zeros, np.empty and so on, as well as other main functions and arythmeric operators, also present in torch:

   >>> torch.ones(2,2)
   tensor([[1., 1.],
           [1., 1.]])
   >>> torch.ones(2,2, dtype=torch.int32)
   tensor([[1, 1],
           [1, 1]], dtype=torch.int32)
   >>> a=torch.ones(2,2) # or torch.ones((2,2)) which is the same
   >>> b=a+1
   >>> c=a*b
   >>> c.reshape(1,4) # or c.view(1,4) which is the same
   tensor(2., 2., 2., 2.)

For tensors, the function size is a function which returns torch.Size object, rather then a member which is a tuple. It is good, because torch.Size inherits tuple and has some additional operators defined:

>>> a=torch.ones(2,3,4)
>>> a.size()
torch.Size([2, 3, 4])
>>> a.size().numel()

The functions sum(), mean() and so on for tensors return not a number but a zero dimensional tensor. Tensor elements are also zero dimensional tensors rather than numbers:

   >>> a = torch.ones(2,2)
   >>> a.sum()
   >>> a.sum().size()
   >>> a.sum().dim() 
   >>> a[0,0]
To convert a zero dimensional tensor to a number, you should explicitly call the function item:
   >>> a.sum().item()

Instead of numpy's astype, in torch there is a function to

   tensor([[1, 1],
           [1, 1]], dtype=torch.int16)

The name is changed because the function to can do more than just change element types. It can also move data to and from CUDA, and it works for the wide range of torch datatypes, including neural networks.

Tensors and numpy matricesEdit

Since tensors and numpy matrices are so similar, it would be nice if we could convert them to each other. And we, indeed, can. It is as easy as cake. To convert tensor to matrix, just call numpy method. For the opposite, call torch.tensor constructor:

   >>> a=torch.ones(2,2, dtype=torch.float16)
   >>> a.numpy()
   array([[1., 1.],
          [1., 1.]], dtype=float16)
   >>> b=np.ones((2,2), dtype=np.float16)
   >>> torch.tensor(b)
   tensor([[1., 1.],
           [1., 1.]], dtype=torch.float16)


While you can use PyTorch without CUDA, it accelerates the computations by a factor of 10-20.

Before using CUDA, check whether it is available. Type:


If it returned False, you may skip the rest of this section.

You may also check the versions of CUDA and cuDNN library:

   >>> torch.version.cuda
   >>> torch.backends.cudnn.version()
   >>> torch.backends.cudnn.enabled

Unlike numpy, tensors can be easily moved to and from CUDA memory. In CUDA, you can do almost whatever you can do out of it. If your computer is equipped with CUDA, and you installed the driver (NVIDIA CUDA 10.0 or higher), you can do the following:

cuda = torch.device('cuda')
a = torch.randn(10000, 10000, device=cuda)
b = torch.randn(10000, 10000, device=cuda)
t = time.time(); c = torch.matmul(a, b); print(time.time()-t)

On my computer, the time was 0.4 seconds, which is   multiplications per second.

You can easily move tensors to and from CUDA memory with to method

>>> cuda = torch.device('cuda')
>>> cpu = torch.device('cpu')
>>> a = torch.ones(5,5)
>>> b = # move to cuda
>>> c = # move back to cpu
>>> a.device
>>> b.device
>>> c.device

You cannot mix CUDA and CPU tensors in your expressions:

>>> a+b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: expected backend CPU and dtype Float but got backend CUDA and dtype Float


Calculating gradients via backpropagation is also easy. You need to specify the requires_grad parameter ("requires" with -s, "grad" without), and call backward method.

>>> a=torch.ones(2,2, requires_grad=True)
>>> b=torch.eye(2,2, requires_grad=True)
>>> c = a*a*(b+1)
>>> d=c.sum() 
>>> d.backward() # calculate gradients
>>> a.grad # gradient of d with respect to a
tensor([[4., 2.],
        [2., 4.]])
>>> b.grad # gradient of d with respect to b
tensor([[1., 1.],
        [1., 1.]])

How does it do it? For each final or intermediate tensor the system stored how it was computed.

In-place operatorsEdit