从头开始建立法学硕士:自动区分(2023)
Building an LLM from Scratch: Automatic Differentiation (2023)

原始链接: https://bclarkson-code.github.io/posts/llm-from-scratch-scalar-autograd/post.html

当前实现的一个关键限制是其范围仅限于涉及加法、减法和乘法的简单数学表达式。 然而,通过一些修改,它可以扩展到处理构建神经网络所需的向量和矩阵操作。 通过这样做,就可以使用 Tricycle 框架创建复杂的模型。 此外,它还可以通过转换为 TensorsFlowGraph 格式来支持 Keras 等流行的机器学习库。

总之,Llama2.jl 提供了 vanilla Julia 代码来在 CPU 上训练小型 Llama2 风格的模型。 虽然由于法律限制,寻找开源数据有时具有挑战性,但近年来,得益于 Andre Karpathy 的“NN Zero to Hero”YouTube 系列或 Neel Nanda 和 Callum McDougall 的资源,基于 Transformer 的神经网络变得越来越流行和易于使用 从头开始变压器教程。 然而,问题仍然是仅使用 Python 训练大型语言模型是否会导致性能和质量受到限制。 尽管如此,nanoGPT 等项目的存在证明,尽管面临这些挑战,必要的代码仍然可用。 总体而言,自然语言处理和深度学习研究的流行继续激发了教导机器学习语言的创新方法,就像人类学习新语言一样——通过理解文本中的模式而不是被迫记住大量词汇。
相关文章

原文

please let me know:

The LLM from scratch tech tree

The LLM from scratch tech tree

Before we can move onto building modern features like Rotary Positional Encodings, we first need to figure out how to differentiate with a computer. The backpropagation algorithm that underpins the entire field of Deep Learning requires the ability to differentiate the outputs of neural networks with respect to (wrt) their inputs. In this post, we’ll go from nothing to an (admittedly very limited) automatic differentiation library that can differentiate arbitrary functions of scalar values.

This one algorithm will form the core of our deep learning library that, eventually, will include everything we need to train a language model.

Creating a tensor

We can’t do any differentiation if we don’t have any numbers to differentiate. We’ll want to add some extra functionality that is in standard float types so we’ll need to create our own. Let’s call it a Tensor.

Wolfram Alpha) the derivative of \(L\) wrt \(x\) is: \[\frac{\partial L}{\partial x} = 2m (c + mx - y)\] Plugging the values for our tensors in, we get \[2\times2 (4 + (2\times3) - 1) = 36\]

Wolfram Alpha, the derivative of this expression is: \[\frac{d f(x)}{dx} = -38 + 102 x - 33 x^2 + 8 x^3 + 30 x^4\]

If we plug 2 into this equation, the answer is apparently 578 (again, thanks to Wolfram Alpha).

Let’s try it with our algorithm

Tricycle which is the name for the deep learning framework we’re building.

联系我们 contact @ memedata.com