![]() |
|
![]() |
| You're right, and it's also possible to still use LLMs and vector search in such a system, but instead you use them to enrich the queries made to traditional, pre-existing knowledge bases and search systems. Arguably you could call this "generative assisted retrieval" or GAR.. sadly I didn't coin the term, there's a paper about it ;-) https://aclanthology.org/2021.acl-long.316/
|
![]() |
| I always wondered why a RAG index has to be a vector DB.
If the model understands text/code and can generate text/code it should be able to talk to OpenSearch no problem. |
![]() |
| I've been using SQLite FTS (which is essentially BM25) and it works so well I haven't really bothered with vector databases, or Postgres, or anything else yet. Maybe when my corpus exceeds 2GB... |
![]() |
| The biggest one is that it's hard to get "zero matches" from an embeddings database. You get back all results ordered by distance from the user's query, but it will really scrape the bottom of the barrel if there aren't any great matches - which can lead to bugs like this one: https://simonwillison.net/2024/Jun/6/accidental-prompt-injec...
The other problem is that embeddings search can miss things that a direct keyword match would have caught. If you have key terms that are specific to your corpus - product names for example - there's a risk that a vector match might not score those as highly as BM25 would have so you may miss the most relevant documents. Finally, embeddings are much more black box and hard to debug and reason about. We have decades of experience tweaking and debugging and improving BM25-style FTS search - the whole field of "Information Retrieval". Throwing that all away in favour of weird new embedding vectors is suboptimal. |
![]() |
| I'll add to what the other commenter noted, but sometimes the difference between results get very granular (i.e. .65789 vs .65788) so deciding on where that threshold should be is little trickier. |
![]() |
| Honestly you clocked the secret: it doesn’t.
It makes sense for the hype, though. As we got LLM’s we also got wayyyy better embedding models, but they’re not dependencies. |
![]() |
| Text embeds don't capture inferred data, like "second letter of this text" does not embed close to "e". LLM chain of thought is required to deduce the meaning more completely. |
![]() |
| You don't need to update the whole model for everyone. Fine tuning exists and is even available as a service in openai. The updates are only visible in the specific models you see. |
![]() |
|
I would mildly disagree with this; Neo4j just serves as an underlying storage mechanism much like Postgres+pgvector could be the underlying storage mechanism for embedding-only RAG. How one extracts entities and connects them in the graph happens a layer above the storage layer of Neo4j (though Neo4j can also do this internally). Neo4j is not magic; the application layer and data modelling still has to define which entities and how they are connected.But why Neo4j? Neo4j has some nice amenities for building GRAG on top of. In particular, it has packages to support community partitioning including Leiden[0] (also used by Microsoft's GraphRAG[1]) and Louvain[2] as well as several other community detection algorithms. The built-in support for node embeddings[3] as well as external AI APIs[4] make the DX -- in so far as building the underlying storage for complex retrieval -- quite good, IMO. The approach that we are taking is that we are importing a corpus of information into Neo4j and performing ETL on the way in to create additional relationships; effectively connecting individual chunks by some related "facet". Then we plan to run community detection over it to identify communities of interest and use a combination of communities, locality, and embedding match to retrieve chunks. I just started exploring it over the past week and I would say that if your team is going to end up doing some more complex GRAG, then Neo4j feels like it has the right tooling to be the underlying storage layer and you could even feasibly implement other parts of your workflow in there as well, but entity extraction and such feels like it belongs one layer up in the application layer. Most notably, having direct query access to the underlying graph with a graph query language (Cypher) means that you will have more control and different ways to experiment with retrieval. However; as I mentioned, I would encourage most teams to be more clever with embedding RAG before adding more infrastructure like Neo4j. [0] https://neo4j.com/docs/graph-data-science/current/algorithms... [1] https://microsoft.github.io/graphrag/ [2] https://neo4j.com/docs/graph-data-science/current/algorithms... [3] https://neo4j.com/docs/graph-data-science/current/machine-le... |
![]() |
| An interesting paper that was recently published that talks about a different approach: Human-like Episodic Memory for Infinite Context LLMs https://arxiv.org/abs/2407.09450>
This wasn't focused on RAG, but there seems to be a lot of crossover to me. Using the LLM to make "episodes" is a similar problem to chunking, and letting the LLM decide the boundary might also yield good results. |
![]() |
| How do you weight results between vector search and bm25? Do you fall back to bm25 when vector similarity is below a threshold, or maybe you tweak the weights by hand for each data set? |
I never could get much beyond the basic search piece. I don't see how mixing in a black box AI model with probabilistic outcomes could add any value without having this working first.