![]() |
|
Candle dev here, we also support training/backdrop! We certainly focus on optimizing inference performance but hopefully that should improve the training efficiency too.
|
![]() |
|
That's true it's got an impact, but I think there's still space available for "slightly slower with 2x memory" models. For many local uses, new cards are way past the "fast enough" line, but having 64gb on them would be really beneficial. It's love to see some experiments / different SKUs in this area, given people are already diy-ing extra memory on NVIDIA. (https://hackaday.com/2021/01/29/add-an-extra-8gb-of-vram-to-... there were stable experiments later on, but I don't have a link now) |
![]() |
|
Graphics card manufacturers believe that selling high-memory consumer graphics cards will affect the market for commercial computing cards, so they will not do so, that's all.
|
![]() |
|
When Lex recently talked to Andre, Andre said that he gets positively obsessed with a problem and says "this must exist". I imagine this must be one of those outputs.
|
![]() |
|
The transformer itself just takes arrays of numbers and turns them into arrays of numbers. What you are interested in is the process that happens before and after the transformer.
|
![]() |
|
Is this able to replace PyTorch, ... in normal practice? No. Does this show that in general the most used ML frameworks are a mess? Yes. |
![]() |
|
The author has a whole series where he does exactly that. YouTube videos, code examples, documentation, everything. Explains the math, explains how to code it, explains the architecture. Everything.
|
![]() |
|
> How am I supposed to know that? You’re not supposed to know that. You asked a question, and this is you being told the answer. It’s very convenient that the author of the post is quite literally the world’s most prolific teacher on this topic. Makes it easy to find Karpathy. You shouldn’t be expected to otherwise know that (or else why ask if you knew). > I didn't realize variables had to be so short in C. Glad I write C++ professionally where they've added support for longer variable names. This feels like a joke but old C compilers did have variable length limits. This is part of why C historically had shorter variables than other more modern languages. Sorry if it came off rude, the internet is hard to communicate over. https://publications.gbdirect.co.uk/c_book/chapter2/keywords... |
![]() |
|
On one hand, really nice to see the whole thing in 1000 lines of C code. On the other hand, that malloc function low key terrifies me. :) |
![]() |
|
OT but question from someone curious..... is Cuda still entrenched as the only option for doing AI or is there growing support for AMD/Intel/Other ways of doing AI?
|
![]() |
|
Randomly stumbled over this[1] post with another fed up open source contributor, due to several serious issues with AMDs GPU drivers and firmware that remain unresolved for years. It also references the geohot decision you mention. Some quotes: I find it incredible that these companies that have large support contracts with you and have invested hundreds of thousands of dollars into your products, have been forced to turn to me, a mostly unknown self-employed hacker with very limited resources to try to work around these bugs (design faults?) in your hardware. In the VFIO space we no longer recommend AMD GPUs at all, in every instance where people ask for which GPU to use for their new build, the advise is to use NVidia. [1]: https://www.reddit.com/r/Amd/comments/1bsjm5a/letter_to_amd_... |
![]() |
|
Python has been popular for this because it’s convenient to quickly hack on and experiment with, not because it’s the most efficient thing.
|
![]() |
|
The overhead really isn't that bad is it? Since the the python code is mostly about saying multiply matrix A with matrix B, and then that actual computation is done by optimized low level code.
|
![]() |
|
> The issue was using either a dict or a list in a hot loop and changing it to the other sped it up like 1000x. The programmer using the wrong data structure is not a problem with the language. |
![]() |
|
The plan is to eventually implement with CUDA: "Currently, I am working on [...] direct CUDA implementation, which will be significantly faster and probably come close to PyTorch." |