展示HN:Antfly:分布式、多模态搜索和记忆与图,使用Go语言。
Show HN: Antfly: Distributed, Multimodal Search and Memory and Graphs in Go

原始链接: https://github.com/antflydb/antfly

## Antfly:一种多模态分布式搜索引擎 Antfly 是一款强大的开源搜索引擎,基于 etcd 的 raft 构建,旨在处理多种数据类型——文本、图像、音频和视频。它独特地将全文搜索(BM25)、向量相似性搜索和图遍历结合在一个查询中,自动生成嵌入向量、分块数据和提取图关系。 主要特性包括内置的 RAG 代理(用于检索增强生成)、混合搜索能力和自动数据丰富管道。Antfly 支持各种模型提供商(Ollama、OpenAI 等),并提供 ACID 事务、文档 TTL 和 S3 存储集成等功能。 它可以通过 Docker 或单节点集群轻松部署,并提供一个 Web 控制面板(“Antfarm”)用于探索。开发者可以将 Antfly 集成到现有的 Postgres 数据库中,使用 `pgaf` 扩展,或利用预构建的 React 组件来构建搜索 UI。 Antfly 优先考虑可靠性,通过广泛的混沌测试和形式化验证来实现,核心服务器采用 Elastic License 2.0 许可,大多数组件采用 Apache 2.0 许可。

## Antfly:多模态数据库与搜索引擎 Antfly 是一种新的分布式文档数据库和搜索引擎,使用 Go 语言构建,提供全文、向量和图搜索功能于一体。由 kingcauchy 创建,旨在通过单个二进制部署(通过 `antfly swarm`)简化开发,适用于本地使用和可扩展部署。 主要特性包括多模态索引(图像、音频、视频)、MongoDB 风格的更新、流式 RAG 支持,以及通过内置服务 Termite 实现原生 ML 推理——无需为嵌入等任务进行外部 API 调用。它基于多 Raft 设置,使用 etcd 和 Pebble 构建,并包含 Kubernetes operator 和 MCP 服务器。 开发者选择了 Elastic License v2,允许使用、修改和自托管,但禁止将 Antfly 作为托管服务提供。初步用户反馈集中在查询能力(组合搜索类型)以及潜在 CLI 的可用性,用于管理索引并直接从终端运行查询。
相关文章

原文

Antfly is a distributed search engine built on etcd's raft library. It combines full-text search (BM25), vector similarity, and graph traversal over multimodal data — text, images, audio, and video. Embeddings, chunking, and graph edges are generated automatically as you write data. Built-in RAG agents tie it all together with retrieval-augmented generation.

Quickstart

# Start a single-node cluster with built-in ML inference
go run ./cmd/antfly swarm

# Or run with Docker
docker run -p 8080:8080 ghcr.io/antflydb/antfly:omni

That gives you the Antfarm dashboard at http://localhost:8080 — playgrounds for search, RAG, knowledge graphs, embeddings, reranking, and more.

See the quickstart guide for a full walkthrough.

  • Hybrid search — full-text (BM25), dense vectors, and sparse vectors (SPLADE), all in one query
  • RAG agents — built-in retrieval-augmented generation with streaming, multi-turn chat, tool calling (web search, graph traversal), and confidence scoring
  • Graph indexes — automatic relationship extraction and graph traversal queries over your data
  • Multimodal — index and search images, audio, and video with CLIP, CLAP, and vision-language models
  • Reranking — cross-encoder reranking with score-based pruning to cut the noise
  • Aggregations — stats (sum/min/max/avg) and terms facets for analytics
  • Transactions — ACID transactions at the shard level with distributed coordination
  • Document TTL — automatic document expiration so you don't have to clean up yourself
  • S3 storage — store data in S3/MinIO/R2 for big cost savings and way faster shard splits
  • SIMD / SME acceleration — vector operations use hardware intrinsics via go-highway on x86 and ARM
  • Distributed — Raft consensus, automatic sharding and replication, horizontal scaling
  • Enrichment pipelinesconfigurable pipelines per index for embeddings, summaries, graph edges, and custom computed fields
  • Bring your own models — Ollama, OpenAI, Bedrock, Google, or run models locally with Termite
  • Auth — built-in user management with API keys, basic auth, and bearer tokens
  • Backup & restore — to local disk or S3
  • Kubernetes operator — deploy and manage clusters with the operator
  • MCP serverModel Context Protocol so LLMs can use Antfly as a tool
  • A2A protocolAgent-to-Agent support for Google's A2A standard
  • Antfarmweb dashboard with playgrounds for search, RAG, knowledge graphs, embeddings, reranking, chunking, NER, OCR, and transcription

antfly.io/docs

pgaf — PostgreSQL Extension

pgaf brings Antfly search into Postgres. Create an index, use the @@@ operator, and you're done:

CREATE INDEX idx_content ON docs USING antfly (content)
  WITH (url = 'http://localhost:8080/api/v1/', collection = 'my_docs');

SELECT * FROM docs WHERE content @@@ 'fix my computer';

@antfly/components gives you drop-in React components for search UIs — SearchBox, Autosuggest, Facet, Results, RAGBox, AnswerBox, plus streaming hooks like useAnswerStream and useCitations.

Termite handles the ML side: embeddings, chunking, reranking, classification, NER, OCR, transcription, generation, and more. It ships as a submodule and runs automatically in swarm mode — you don't need to set it up separately.

Package What it does Source
docsaf Ingest content from filesystem, web crawl, git repos, and S3 pkg/docsaf
evalaf LLM/RAG/agent evaluation ("promptfoo for Go") pkg/evalaf
Genkit plugin Firebase Genkit integration for retrieval and docstore pkg/genkit/antfly

Antfly uses a multi-raft design with separate consensus groups:

  • Metadata raft — table schemas, shard assignments, cluster topology
  • Storage rafts — one per shard, handling data, indexes, and queries

End-to-end chaos tests — inspired by Jepsen — cover node crashes, leader failures, shard splits under load, and cluster scaling. These tests run real multi-node clusters and inject faults to verify that Raft consensus, transactions, and replication behave correctly under failure.

Critical distributed protocols are formally specified and model-checked with TLA+:

Join the Discord for support, discussion, and updates.

Interested in contributing? See CONTRIBUTING.md.

The core server is Elastic License 2.0 (ELv2). That means you can use it, modify it, self-host it, and build products on top of it — you just can't offer Antfly itself as a managed service. Everything else — the SDKs, React components, Termite, pgaf, docsaf, evalaf — is Apache 2.0. We tried to keep as much as possible under a permissive license.

联系我们 contact @ memedata.com