![]() |
|
![]() |
|
Here are links to the most recent pull requests sent
|
![]() |
|
I think it's a good idea for everyone to download and be able to run a LLM locally, even if you have the minimum of requirements. As a pseudo-backup of a large chunk of human knowledge.
|
![]() |
|
The response seems pretty reasonable; it's answering the question it was asked. If you want to ask it how to do the difficult part, ask it about that instead. Expecting it to get the answer right in the first pass is like expecting your code to compile the very first time. You have to have more of a conversation with it to coax the difference out of you're thinking and what you're actually saying. If you're looking to read a more advanced example of its capabilities and limitations, try https://simonwillison.net/2024/Mar/23/building-c-extensions-... |
![]() |
|
https://chat.openai.com/share/c8c19f42-240f-44e7-baf4-50ee5e... https://godbolt.org/z/s9Yvnjz7K I mean I could write the algorithm by hand pretty quickly in C++ and would follow the exact same thought pattern but also deal with the edge cases. And factoring in the loss of productivity from the context switch that is a net negative. This algorithm is also not generic over enough cases but that is just up to the prompt. If I can't trust it to write `strip_whitespace` correctly which is like 5 lines of code, can I trust it to do more without a thorough review of the code and writing a ton of unit tests... Well I was going to do that anyway. The argument that I just need to learn better prompt engineering to make the LLM do what I want just doesn't sit with me when instead I could just spend the time writing the code. As I said your last point is absolutely the place I can see LLMs being actually useful but then I need to spend a significant amount of time in code review for generated code from an "employee" who is known to make up interfaces or entire libraries that doesn't exist. |
![]() |
|
I'm a Python-slinging data scientist so C++ isn't my jam (to say the least), but I changed the prompt to the following and asked it to GPT-4: > Write me an algorithm in C++ which finds the begin and end iterator of a sequence where leading and trailing whitespace is stripped. Please write secure code that handles any possible edge cases. It gave me this: https://chat.openai.com/share/55a4afe2-5db2-4dd1-b516-a3cacd... I'm not sure what other edge cases there might be, however. This only covers one of them. In general, I've found LLMs to be marginally helpful. Like, I can't ever remember how to get matplotlib to give me the plot I want, and 9 times out of 10 GPT-4 easily gives me the code I want. Anything even slightly off the beaten path, though, and it quickly becomes absolutely useless. |
![]() |
|
My guess is that this was generated using GPT4? Free GPT I get https://chat.openai.com/share/f533429d-63ca-4505-8dc8-b8d2e7... which has exactly the same problem as my previous example and doesn't consider the string of all whitespace. Sure GPT4 is better at that, it wasn't the argument made. The example you gave absolutely was the code I would write on a first draft since it does cover the edge cases (assuming we aren't dealing with the full UTF charset and all that could be considered a space there). However this is code that is trivial to write in any language and the "Is it that hard to input a prompt into the free version of ChatGPT and see how it helps with programming? " argument doesn't hold up. Am I to believe it will implement something more complex correctly. This is also code that would absolutely be in hundreds of codebases so GPT has tons of context for it. |
![]() |
|
If you want to be an amateur chemist I recommend not getting your instructions from an LLM that might be hallucinating. Chemistry can be very dangerous if you're following incorrect instructions.
|
![]() |
|
Completely agree. Playing around with a weak LLM is a great way to give yourself a little bit of extra healthy skepticism for when you work with the strong ones.
|
![]() |
|
If you have an >=M1-class machine with sufficient RAM, the medium-sized models that are on the order of 30GB in size perform decently on many tasks to be quite useful without leaking your data.
|
![]() |
|
I'm using Mixtral 8x7b as a llamafile on an M1 regularly for coding help and general Q&A. It's really something wonderful to just run a single command and have this incredible offline resource.
|
![]() |
|
I concur; in my experience Mixtral is one of the best ~30G models (likely the best pro laptop-size model currently) and Gemma is quite good compared to other below 8GB models.
|
![]() |
|
Check out PrivateGPT on GitHub. Pretty much just works put of the box. I got Mistral7B running on a GTX 970 in about 30 minutes flat first try. Yep, that's the triple-digit GTX 970.
|
![]() |
|
It will be paid down the road, but we are not there yet. It’s all offline, data is locally saved. You own it, we don’t have it even if you ask for it.
|
![]() |
|
Kiwix provides prepackaged highly compressed archives of Wikipedia, Project Gutenberg, and many other useful things: https://download.kiwix.org/zim/. Between that and dirt cheap storage prices, it is possible to have a local, offline copy of more human knowledge than one can sensibly consume in a lifetime. Hell, it's possible to have it all on one's smartphone (just get one with an SD card slot and shove a 1+ Tb one in there). |
![]() |
|
Just create a RAG with wikipedia as the corpus and a low parameter model to run it and you can basically have an instantly queryable corpus of human knowledge runnable on an old raspberry pi.
|
![]() |
|
... having them talk about events from sci fi stories in response to questions about the real world. Having them confidently lie about pretty much everything. Etc.
|
![]() |
|
> a hedge against a potential library of Alexandria incident What would cause a Library of Alexandria incident wiping out all human knowledge elsewhere, that would also allow you to run a local LLM? |
![]() |
|
What about smells or tastes? Or feelings? I can't help but feel we're at the "aliens watch people eat from space and recreate chemically identical food that has no taste" phase of AI development. |
![]() |
|
If the food is chemically identical then the taste would be the same though, since taste (and smell) is about chemistry. I do get what you're saying though.
|
![]() |
|
Maybe I'm seeing things through a modern lens, but if I were trying to restart civilization and was only left with ChatGPT, I would be enraged and very much not grateful for this.
|
![]() |
|
In all fairness, going up to SMS random human and yelling AAAAAAAAAAAAAA… at them for long enough will produce some out-of-distribution responses too.
|
![]() |
|
It is invaluable to have a chunk of human knowledge that can tell you things like the Brooklyn Nets won the 1986 Cricket World Cup by scoring 46 yards in only 3 frames
|
![]() |
|
According to ChatGPT > Australia won the 1987 Cricket World Cup. The 1986 date is incorrect; there was no Cricket World Cup in 1986. The tournament took place in 1987, and Australia defeated England in the final to win their first title. https://chat.openai.com/share/e9360faa-1157-4806-80ea-563489... I'm no cricket fan, so someone will have to correct Wikipedia if that's wrong. If you want to point out that LLMs hallucinate, you might want to speak plainly and just come out and say it, or at least give a real world example and not one where it didn't. |
![]() |
|
An LLM will always give the same output for the same input. It’s sorta like a random number generator that gives the same list of “random” numbers for the same seed. LLMs get a seed too.
|
![]() |
|
> If you want factual answers from a local model it might help to turn the temperature down. This makes sense. If you interact with a language model and it says something wrong it is your fault |
![]() |
|
Are they though? They are lossy compressing trillions of tokens into a few dozen GB. The decompression action is fuzzy and inefficient though.
|
![]() |
|
And it requires massive computational power to decompress, which I don't expect to be available in a catastrophic situation where humans have lost a large chunk of important knowledge.
|
![]() |
|
I don't necessarily agree. It requires massive computing power, but running models smaller than 70G parameters is possible on consumer hardware, albeit slowly.
|
![]() |
|
Parent may be thinking more along the lines of a “hope we can print all the knowledge“ type catastrophe. Though if there is zero compute it’ll be tough reading all those disks!
|
![]() |
|
I use a tool called LM Studio, makes it trivial to run these models on a Mac. You can also use it as a local API so it kinda acts like a drop-in replacement for the openAI API.
|
![]() |
|
The processing required to run current language models with a useful amount of knowledge encoded in them is way more than I imagine would be available in a "world scale disaster".
|
![]() |
|
Great links,
especially last one referencing the Goto paper: https://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/... >> I believe the trick with CPU math kernels is exploiting instruction level parallelism with fewer memory references It's the collection of tricks to minimize all sort of cache misses (L1, L2, TLB, page miss etc), improve register reuse, leverage SIMD instructions, transpose one of the matrices if it provides better spatial locality, etc. |
![]() |
|
Tinyllama isn't going to be doing what ChatGPT does, but it still beats the pants off what we had for completion or sentiment analysis 5 years ago. And now a Pi can run it decently fast.
|
> I learned how to write math kernels by renting Vast VMs and watching Gautham Venkatasubramanian and mrdomino develop CUDA kernels in a tmux session. They've been focusing on solving a much more important challenge for llamafile, which is helping it not have a mandatory dependency on the cuBLAS
If I'm reading this right, they're trying to rewrite cuBLAS within CUDA itself. I'm guessing the next step would be removing CUDA dependency and go with directly using Vulkan or Metal compute shaders. Am I correct?