MyTorch – 450行Python代码实现的极简自动微分。
MyTorch – Minimalist autograd in 450 lines of Python

原始链接: https://github.com/obround/mytorch

## mytorch:一个轻量级自动微分库 `mytorch` 是一个自动微分(autograd)系统的Python实现,模仿PyTorch API。它利用NumPy进行数值计算,并采用类似于PyTorch的基于图的反向模式自动微分方法。 主要特性包括:易于扩展,支持`torch.autograd.backward`和`torch.autograd.grad`,以及计算任意阶导数的能力,*无需*为更高阶计算设置`create_graph=True`——这是对PyTorch的简化。 该库支持标量和非标量运算,包括广播,并能准确计算梯度,如提供的示例所示。虽然目前基于NumPy,但作者指出可以通过CuPy或Numba实现GPU加速,甚至可以使用BLAS库进行更底层的重写,但承认后者在很大程度上是多余的。本质上,`mytorch` 提供了一个简化、教育性的自动微分实现。

相关文章

原文

Easily extensible autograd implemented python with pytorch API. Uses numpy to do the heavy-lifting. Implementation is very similar to pytorch (graph-based reverse-mode autodiff). It wouldn't be too tough to extend the autograd, implement torch.nn, and possibly run on GPU (presumably with CuPy or Numba). It would be an interesting (but useless) endeavor to rewrite mytorch in a low level language using BLAS library calls instead on numpy, just like pytorch.

mytorch supports the computation of arbitrarily high derivatives for both scalars and non-scalars. Both torch.autograd.backward and torch.autograd.grad are supported.

import mytorch as torch

a = torch.tensor(3., dtype=torch.float32, requires_grad=True)
b = torch.tensor(10., dtype=torch.float32, requires_grad=True)
c = 2 + (a + b ** 2) / (a + b + a * b)

print("a =", a)
print("b =", b)
print("c = 2 + (a + b ** 2) / (a + b + a * b) =", c)

# NOTE: You could also use c.backward() to accumulate the gradients in a.grad and b.grad
dc_da, dc_db = torch.autograd.grad(c, [a, b])
# NOTE: To get higher order derivatives like below, pytorch would require ∂c/∂a and
# ∂c/∂b to be calculated with create_graph=True; mytorch does not require it
d2c_da2 = torch.autograd.grad(dc_da, [a])[0]
d2c_db2 = torch.autograd.grad(dc_db, [b])[0]
print(f"∂c/∂a = {dc_da}")
print(f"∂c/∂b = {dc_db}")
print(f"∂²c/∂a² = {d2c_da2}")
print(f"∂²c/∂b² = {d2c_db2}")

Output:

a = tensor(3.0, requires_grad=True)
b = tensor(10.0, requires_grad=True)
c = 2 + (a + b ** 2) / (a + b + a * b)
  = tensor(4.395348787307739, requires_grad=True)
∂c/a = tensor(-0.5895078420767982, requires_grad=True)
∂c/b = tensor(0.24229313142239048, requires_grad=True)
∂²c/a² = tensor(0.3016086633881293, requires_grad=True)
∂²c/b² = tensor(0.0014338360144389717, requires_grad=True)

Here is a non-scalar example (with broadcasting):

import mytorch as torch

a = torch.tensor([[1, 2, 3], [4, 5, 6]], dtype=torch.float32, requires_grad=True)
b = torch.tensor([7, 8, 9], dtype=torch.float32, requires_grad=True)
# b is broadcasted
c = a + b

print("a =", a)
print("b =", b)
print("c =", c)
c.backward(torch.ones(2, 3))
print("∂c/∂a =", a.grad)
print("∂c/∂b =", b.grad)

Output:

a = tensor([[1. 2. 3.]
            [4. 5. 6.]], requires_grad=True)
b = tensor([7. 8. 9.], requires_grad=True)
c = tensor([[ 8. 10. 12.]
            [11. 13. 15.]], requires_grad=True)
∂c/a = tensor([[1. 1. 1.]
                [1. 1. 1.]], requires_grad=False)
∂c/b = tensor([2. 2. 2.], requires_grad=False)
联系我们 contact @ memedata.com