Muvera：使多向量检索速度与单向量搜索一样快

Muvera：使多向量检索速度与单向量搜索一样快
Muvera: Making multi-vector retrieval as fast as single-vector search

原始链接: https://research.google/blog/muvera-making-multi-vector-retrieval-as-fast-as-single-vector-search/

神经嵌入模型是信息检索 (IR) 的基础，它通过将数据表示为向量并使用内积相似度来实现高效搜索。多向量模型，例如 ColBERT，通过使用多个嵌入表示数据并采用更复杂的相似度度量（如Chamfer相似度）来提高精度。然而，这种性能提升是以显著增加计算复杂度为代价的。 “MUVERA：基于固定维度编码的多向量检索” 解决了这一挑战，它将多向量检索转化为一个更易于管理的问题。它为查询和文档构建固定维度编码 (FDE)——单个向量，通过简单的内积来逼近多向量相似度。这允许利用优化的最大内积搜索 (MIPS) 算法进行初始候选检索，然后使用精确的多向量相似度进行重新排序。MUVERA 有效地弥合了单向量检索和多向量检索之间的效率差距，在不牺牲速度的情况下实现了高精度，并且已在 GitHub 上开源。

Hacker News 最新 | 往期 | 评论 | 提问 | 展示 | 招聘 | 提交登录 Muvera：使多向量检索速度与单向量搜索一样快（research.google） georgehill 3小时前 10 分 | 隐藏 | 往期 | 收藏 | 讨论考虑申请 YC 2025 年秋季批次！申请截止日期为 8 月 4 日指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系我们搜索：

原文

Neural embedding models have become a cornerstone of modern information retrieval (IR). Given a query from a user (e.g., “How tall is Mt Everest?”), the goal of IR is to find information relevant to the query from a very large collection of data (e.g., the billions of documents, images, or videos on the Web). Embedding models transform each datapoint into a single-vector “embedding”, such that semantically similar datapoints are transformed into mathematically similar vectors. The embeddings are generally compared via the inner-product similarity, enabling efficient retrieval through optimized maximum inner product search (MIPS) algorithms. However, recent advances, particularly the introduction of multi-vector models like ColBERT, have demonstrated significantly improved performance in IR tasks.

Unlike single-vector embeddings, multi-vector models represent each data point with a set of embeddings, and leverage more sophisticated similarity functions that can capture richer relationships between datapoints. For example, the popular Chamfer similarity measure used in state-of-the-art multi-vector models captures when the information in one multi-vector embedding is contained within another multi-vector embedding. While this multi-vector approach boosts accuracy and enables retrieving more relevant documents, it introduces substantial computational challenges. In particular, the increased number of embeddings and the complexity of multi-vector similarity scoring make retrieval significantly more expensive.

In “MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings”, we introduce a novel multi-vector retrieval algorithm designed to bridge the efficiency gap between single- and multi-vector retrieval. We transform multi-vector retrieval into a simpler problem by constructing fixed dimensional encodings (FDEs) of queries and documents, which are single vectors whose inner product approximates multi-vector similarity, thus reducing complex multi-vector retrieval back to single-vector maximum inner product search (MIPS). This new approach allows us to leverage the highly-optimized MIPS algorithms to retrieve an initial set of candidates that can then be re-ranked with the exact multi-vector similarity, thereby enabling efficient multi-vector retrieval without sacrificing accuracy. We have provided an open-source implementation of our FDE construction algorithm on GitHub.

Muvera：使多向量检索速度与单向量搜索一样快 Muvera: Making multi-vector retrieval as fast as single-vector search

Muvera：使多向量检索速度与单向量搜索一样快
Muvera: Making multi-vector retrieval as fast as single-vector search