(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=39387850

总之,Llama2.jl 提供了 vanilla Julia 代码来在 CPU 上训练小型 Llama2 风格的模型。 虽然由于法律限制,寻找开源数据有时具有挑战性,但近年来,得益于 Andre Karpathy 的“NN Zero to Hero”YouTube 系列或 Neel Nanda 和 Callum McDougall 的资源,基于 Transformer 的神经网络变得越来越流行和易于使用 从头开始变压器教程。 然而,问题仍然是仅使用 Python 训练大型语言模型是否会导致性能和质量受到限制。 尽管如此,nanoGPT 等项目的存在证明,尽管面临这些挑战,必要的代码仍然可用。 总体而言,自然语言处理和深度学习研究的流行继续激发了教导机器学习语言的创新方法,就像人类学习新语言一样——通过理解文本中的模式而不是被迫记住大量词汇。

相关文章

原文
Hacker News new | past | comments | ask | show | jobs | submit login
Building an LLM from Scratch: Automatic Differentiation (2023) (bclarkson-code.github.io)
334 points by netwrt 1 day ago | hide | past | favorite | 16 comments










I did a similar thing for Julia: Llama2.jl contains vanilla Julia code [1] for training small Llama2-style models on the CPU.

[1] https://github.com/cafaxo/Llama2.jl/tree/master/src/training



How hard was it to find open source data nowadays? I saw that books3 are already made illegal to train on.


Great stuff. Thanks for sharing.


Every one should go through this rite of passage work and get to the "Attention is all you need" implementation. It's a world where engineering and the academic papers are very close and reproducible and a must for you to progress in the field.

(see also andre karpathys zero to hero nn series on youtube as well its very good and similar to this work)



I would also recommend going through Callum McDougall/Neel Nanda's fantastic Transformer from Scratch tutorial. It takes a different approach to conceptualizing the model (or at least, it implements it in a way which emphasizes different characteristics of Transformers and self-attention), which I found deeply satisfying when I first explored them.

https://arena-ch1-transformers.streamlit.app/%5B1.1%5D_Trans...



Thanks for sharing. This is a nice resource


That magic moment in Karpathys first video when he gets to the loss function and calls backward for the first time - this is when it clicked for me. Highly recommended!


+1 for Karpathy, the series is really good


Is this YouTube series also “from scratch (but not really)”

Edit - it is. Not to talk down on the series. I’m sure it’s good, but it is actually “LLM with PyTorch”.

Edit - I looked again and I was actually not correct. He does ultimately use frameworks, but gives some early talk about how those function under the hood.



I appreciate you coming back and giving more details, it encourages me to look into it now. Maybe my expectations on the internet are just low, but I thought it was a virtuous act worth the effort, I wish more people would continue with skepticism but be willing to follow through and let their opinions change given solid evidence.


Very well written. AD is like magic and this is a good exposition on the basic building block.

I quite like Jeremy's approach: https://nbviewer.org/github/fastai/fastbook/blob/master/17_f...

It shows a very simple "Pythonic" approach to assemble gradient of a composition of functions from the gradients of the components.



As a chronic premature optimizer my first reaction was, "Is this even possible in vanilla python???" Obviously it's possible, but can you train an LLM before the heat death of the universe? A perceptron, sure, of course. A deep learning model, plausible if it's not too deep. But a large language model? I.e. the kind of LLM necessary for "from vanilla python to functional coding assistant."

But obviously the author already thought of that. The source repo has a great motto: "It don't go fast but it do be goin'" [1]

I love the idea of the project and I'm curious to see what the endgame runtime will be.

[1] https://github.com/bclarkson-code/Tricycle



Why wouldn't it be possible? You can generate machine code with Python and call into it with ctypes. All your deep learning code is still in Python, but in the runtime it gets JIT compiled into something faster.


is there an existing SLM that resembles an LLM in architecture that includes the code for training it ?

i realize the cost and time to train may be prohibitive and that quality on general english might be very limited, but is the code itself available ?



Not sure what you mean with SLM, but https://github.com/karpathy/nanoGPT


The only problem is it's implemented in Python. One reason is i hate to install python on my machine, and i don't know how to manage dependencies. The MacOSX required to upgrade to install native stuffs. Such a hell.






Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com