![]() |
|
![]() |
|
Sure (if they are openai api compatible i can add them within minutes) otherwise I'm open for pull requests :) Also, i don't own an Nvidia Card or Windows / MacOS |
![]() |
|
Impressive, I don't think I've seen a local model call upon specialised modules yet (although I can't keep up with everything going on). I too use local 7b open-hermes and it's really good. |
![]() |
|
It's my go to "structured text model" atm. Try "starling-ml-beta" (7b) for some very impressive chat capabilities. I honestly think that it outperforms GPT3 half the time.
|
![]() |
|
Thank you — from that page, at the bottom, I was able to find this link to what I think are the quantized versions https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-... If you have the time, could you explain what you mean by "Q5 is minimum"? Did you determine that by trying the different models and finding this one is best, or did someone else do that evaluation, or is that just generally accepted knowledge? Sorry, I find this whole ecosystem quite confusing still, but I'm very new and that's not your problem. |
![]() |
|
This is so cool! And the fact that you can use Ollama as 'llm backend' makes it sustainable. didn't see how to switch models in the demo, that might be worth to highlight in readme..
|
![]() |
|
Well i don't indent to get money from people, so i guess showing real results isnt a "problem". Besides i think the following sentences arent wrong? Its just a 7b model give it some slack haha |
![]() |
|
Fine tuning on your own knowledge probably isn't what you want to do, you probably want to do retrieval aided generation instead. Basically a search engine on some local documents, and you put the results of the search into your prompt. The search engine uses the same vector space as your language model as its index, so the results should be highly relevant to whatever the prompt is. I'd start with "librechat" and mistral, so far that's one of the best chat interfaces and has good support for self hosting. For the actual model runner, ollama seems to be the way to go. I believe it's built on "langchain", so you can switch to that when it makes sense to. When you've tested all your queries and setup with librechat, know that librechat is a wrapper around "langchain". I'd start by testing the workflow in librechat, and if librechat's API doesn't do what you want, well I've always found fastAPI pleasant to work with. --- Less for your use case, and more in-general. I've been assessing a lot of LLM interfaces lately, and the weird porn community has some really powerful and flexible interfaces. With sillytavern you can set up multiple agents, have one agent program, another agent critique, and a third asses it for security concerns. This kind of feedback can help catch a lot of LLM mistakes. You can also go back and edit the LLM's response, which can really help. If you go back and edit an LLM message to fix code or change variable names, it will tend to stick with those decisions. But those interfaces are still very much optimized for "Role playing". Recommend keeping an eye on https://www.reddit.com/r/LocalLLaMA/ |
![]() |
|
It says it's a "locally running search engine" - but not sure how it finds the sites and pages to index in the first place?
|
![]() |
|
Yea I guess that's misleading, I should probably change that. I was referring to the LLM part as locally running. Indexing is still done by the big guys and queried using searxng
|
![]() |
|
Awesome project! As I newbie myself in everything LLM, where should I start looking to create a similar project than yours? Which resources/projects are good to know about? Thank you for sharing!
|
![]() |
|
Excellent work! I plan to use it with existing LLMs tbh, but great to see it working locally also! Thank you so much for sharing. I love the architecture.
|
![]() |
|
Did you really make a perplexity clone if you didn’t spend more time promoting yourself on Twitter and LinkedIn than on the engineering?
|
![]() |
|
Wait, are you directly comparing Perplexity and C.ai or Pi? Perplexity is a search engine, Pi is a chatbot, and C.ai is roleplay? Their value propositions are very different
|
It's basically a LLMs with access to a search engine and the ability to query a vector db.
The top n results from each search query (initialized by the LLM) will be scraped, split into little chunks and saved to the vector db. The LLM can then query this vector db to get the relevant chunks. This obviously isn't as comprehensive as having a 128k context LLM just summarize everything, but at least on local hardware it's a lot faster and way more resource friendly. The demo on GitHub runs on a normal consumer GPU (amd rx 6700xt) with 12gb vRAM.