| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
原始链接: https://news.ycombinator.com/item?id=43296918
这个Hacker News帖子讨论了rlama.dev,一个使用Ollama进行本地LLM处理的开源DocumentAI工具。最初的反馈强调了一个关键限制:它缺乏文档分块功能,这阻碍了它有效处理大型文档(如书籍)的能力。用户指出,目前的实现加载整个文档,超过了嵌入模型的token限制,影响了检索精度。 开发者Dontizi回应说,目前正在测试带有重叠分块的实现,以提高大型文本的RAG性能。讨论还涉及到本地模型中较长的上下文提示需要大量资源的挑战,以及关于分层搜索方法和元数据使用的建议。 用户请求了一些功能,例如用于集成的API,以及架构文档,以及围绕文件系统访问的安全考虑。开发者透露了一个基于Go的技术栈,利用Cobra进行CLI,Ollama API进行LLM集成,以及本地文件系统进行存储,同时计划根据社区反馈进行未来的改进。
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
The embedding model (bge-m3 in this case) has a sequence length of 8192 tokens, i.e. rlama tries to embed the whole book, but Ollama can only put the first few pages into the embedding request.
Then when retrieving, it retrieves the entire document instead of the relevant passage (because there is no chunking), but truncates this to the first 1000 characters, i.e. the first half-page of Table of Contents.
As a result, when queried, the model says: "There is no direct mention of the Buddha in the provided documents." (The word Buddha appears 44,121 times in the documents I indexed.)
A better solution (and, as far as I can tell, what every other RAG does) is to split the document into chunks that can actually fit the context of the embedding model, and then retrieve those chunks -- ideally with metadata about which part of the document it's from.
---
I'd also recommend showing the search results to the user (I think just having a vector search engine is already an extremely useful feature, even without the AI summary / question answering), and altering the prompt to provide references (e.g. the based on the chunk metadata like page number).
reply