MariaDB创新:向量索引性能
MariaDB innovation: vector index performance

原始链接: http://smalldatum.blogspot.com/2026/02/mariadb-innovation-vector-index.html

## MariaDB 12.3 向量搜索性能总结 最近由 MariaDB 基金会赞助,Small Datum LLC 执行的基准测试表明,与 MariaDB 11.8 相比,MariaDB 12.3 在向量搜索方面有了显著的性能提升。使用 dbpedia-openai-X-angular 数据集(100k、500k 和 1000k 规模)的测试表明,MariaDB 12.3 一致地实现了最佳的召回率与精确率结果。 值得注意的是,MariaDB 12.3 的性能提升在更大的数据集上*更为*明显。使用 `vmstat` 的分析表明,这些改进源于每个查询的 CPU 使用率降低。MariaDB 11.8 在这些测试中也优于使用 pgvector 0.8.1 的 Postgres 18.2。 基准测试是在配备 48 个核心和 128GB RAM 的强大 Hetzner 服务器上运行的,使用了自定义编译的数据库版本以确保结果准确。测试证实了数据缓存,将性能隔离到数据库引擎本身。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 MariaDB创新:向量索引性能 (smalldatum.blogspot.com) 8点 由 gslin 1小时前 | 隐藏 | 过去 | 收藏 | 讨论 帮助 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

Last year I shared many posts documenting MariaDB performance for vector search using ann-benchmarks. Performance was great in MariaDB 11 and this blog post explains that it is even better in MariaDB 12. This work was done by Small Datum LLC and sponsored by the MariaDB Foundation. My previous posts were published in January and February 2025.

tl;dr

  • Vector search recall vs precision in MariaDB 12.3 is better than in MariaDB 11.8
  • Vector search recall vs precision in Maria 11.8 is better than in Postgres 18.2 with pgvector 0.8.1
  • The improvements in MariaDB 12.3 are more significant for larger datasets
  • MariaDB 12.3 has the best results because it use less CPU per query, This is confirmed by running vmstat in the background.
Benchmark

This time I used the dbpedia-openai-X-angular tests for X in 100k, 500k and 1000k.

For hardware I used a larger server (Hetzner ax162-s) with 48 cores, 128G of RAM, Ubuntu 22.04 and HW RAID 10 using 2 NVMe devices. 

For databases I used:

  • MariaDB versions 11.8.5 and 12.3.0 with this config file. Both were compiled from source. 
  • Postgres 18.2 with pgvector 0.8.1 with this config file. These were compiled from source. For Postgres tests were run with and without halfvec (float16).

I had ps and vmstat running during the benchmark and confirmed there weren't storage reads as the table and index were cached by MariaDB and Postgres.


The command lines to run the benchmark using my helper scripts are:
    bash rall.batch.sh v1 dbpedia-openai-100k-angular c32r128

    bash rall.batch.sh v1 dbpedia-openai-500k-angular c32r128

    bash rall.batch.sh v1 dbpedia-openai-1000k-angular c32r128

Results: dbpedia-openai-100k-angular

Summary

  • MariaDB 12.3 has the best results
  • the difference between MariaDB 12.3 and 11.8 is smaller here than it is below for 500k and 1000k
Results: dbpedia-openai-500k-angular

Summary

  • MariaDB 12.3 has the best results
  • the difference between MariaDB 12.3 and 11.8 is larger here than above for 100k
Results: dbpedia-openai-1000k-angular

Summary

  • MariaDB 12.3 has the best results
  • the difference between MariaDB 12.3 and 11.8 is larger here than it is above for 100k and 500k
联系我们 contact @ memedata.com