Show HN: Mnemo – 适用于任何 LLM 的本地优先 AI 记忆层(Rust, SQLite, petgraph)
Show HN: Mnemo – local-first AI memory layer for any LLM (Rust, SQLite,petgraph)

原始链接: https://github.com/zaydmulani09/mnemo

**mnemo** 是一个“本地优先”的记忆层,旨在为大语言模型(LLM)提供持久、长期的记忆能力。与在每次对话后都会重置的标准 LLM 不同,mnemo 作为一个辅助服务运行,从输入文本中提取命名实体和关系,并将它们存储在基于 SQLite 的持久化知识图谱中。 主要功能包括: * **智能检索:** 使用 6 阶段流水线(包括全文搜索和图遍历),在 50 毫秒内将高度相关的上下文注入到未来的 LLM 提示词中。 * **隐私至上:** 完全在本地运行,零云端依赖;以单个静态二进制文件的形式运行。 * **灵活集成:** 兼容 Ollama、OpenAI、Anthropic 或任何符合 OpenAI 标准的 API。 * **开发友好:** 提供 REST API、CLI 工具和 Python SDK,方便应用程序无缝集成。 无论是通过 Docker 部署还是作为独立二进制文件运行,mnemo 都能自动化管理结构化知识,让您的 AI 应用程序在跨会话时“记住”用户、概念和关系,且无需牺牲隐私或性能。

```Hacker News新消息 | 过往 | 评论 | 提问 | 展示 | 招聘 | 提交登录Show HN: Mnemo – 适用于任何大语言模型的本地优先 AI 记忆层(Rust, SQLite, petgraph)(github.com/zaydmulani09)5 分,由 zaydmulani 发布于 29 分钟前 | 隐藏 | 过往 | 收藏 | 讨论 帮助 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索: ```
相关文章

原文

Local-first AI memory layer for any LLM. Persistent knowledge graph, entity extraction, semantic retrieval — no cloud required.

Build Status License Crates.io PyPI Docker


Most LLMs forget everything the moment a conversation ends. mnemo fixes that.

mnemo is a sidecar service that watches every conversation you feed it, extracts named entities and relationships using an LLM, builds a persistent knowledge graph in SQLite, and injects relevant context back into future prompts — automatically, in under 50ms. It works with Ollama (fully local, free), OpenAI, Anthropic, or any OpenAI-compatible API. It ships as a single static binary with zero cloud dependency.


  your app
     │
     ▼
  POST /ingest ──► entity extraction (LLM) ──► knowledge graph (SQLite + petgraph)
                                                        │
  POST /retrieve ◄── scoring + ranking ◄── graph traversal + full-text search
     │
     ▼
  context_prompt  ──► inject into your LLM prompt
  1. You POST raw text to /ingest (a conversation turn, a document, a note).
  2. mnemo sends it to your configured LLM and extracts entities (people, tools, places, concepts) and the relationships between them.
  3. Entities are deduplicated by name+type, aliases are merged, and everything is written to SQLite. The in-memory petgraph is updated atomically.
  4. On POST /retrieve, mnemo runs a 6-stage pipeline: full-text chunk search → entity name search → graph expansion (BFS over the knowledge graph) → relation filter → score+rank → assemble a context_prompt string.
  5. You inject context_prompt into your LLM's system prompt. Done.

Path A — Docker + Ollama (fully free, recommended)

git clone https://github.com/zaydmulani09/mnemo
cd mnemo
docker compose up -d

# Pull the llama3 model the first time (~4 GB)
docker exec mnemo-ollama ollama pull llama3

# Verify everything is healthy
curl http://localhost:8080/health

Path B — Binary (Ollama or OpenAI running separately)

cargo install --path crates/mnemo-api

# With Ollama
export MNEMO_LLM_BASE_URL=http://localhost:11434/v1
mnemo-api

# With OpenAI
export MNEMO_LLM_BASE_URL=https://api.openai.com/v1
export MNEMO_LLM_API_KEY=sk-...
export MNEMO_LLM_MODEL=gpt-4o-mini
export MNEMO_LLM_PROVIDER=openai
mnemo-api
from mnemo import MnemoClient

client = MnemoClient()  # server at http://localhost:8080

# Store a memory
client.ingest("I'm building a Rust vector database called vecdb")

# Get context for injection into your next LLM prompt
print(client.get_context("what am I working on?"))

All endpoints accept and return application/json. Base URL: http://localhost:8080.

Method Path Description Request body Response
GET /health Server + DB + LLM status HealthResponse
POST /ingest Store text, extract entities IngestRequest IngestResponse
POST /retrieve Retrieve ranked memory context RetrievalQuery RetrievalResult
GET /entities List entities (paginated) ?limit&offset Entity[]
GET /entities/:id Get entity by UUID Entity
DELETE /entities/:id Delete entity (cascades) {"deleted":true}
GET /entities/:id/neighbors Knowledge graph neighbors ?depth (max 5) GraphNode[]
GET /chunks List memory chunks (paginated) ?limit&offset&session_id MemoryChunk[]
GET /chunks/:id Get chunk by UUID MemoryChunk
DELETE /chunks/:id Delete chunk {"deleted":true}
POST /search Full-text search entities + chunks {"query","limit"} {"entities","chunks"}
DELETE /wipe Delete all memory (irreversible) header: X-Confirm-Wipe: true {"wiped":true}
GET /stats Entity/chunk/graph counts + uptime StatsResponse

Key request/response types:

Full endpoint documentation with curl examples: docs/api.md


Variable Default Description
MNEMO_DB_PATH mnemo.db SQLite database file path
MNEMO_PORT 8080 API server port
MNEMO_LLM_BASE_URL http://localhost:11434/v1 OpenAI-compatible LLM base URL
MNEMO_LLM_MODEL llama3 Model name for entity extraction
MNEMO_LLM_API_KEY ollama API key (any value works for Ollama)
MNEMO_LLM_PROVIDER ollama Provider type: ollama, openai, anthropic, custom

Pass --config path/to/config.toml to mnemo-api. See mnemo.example.toml:

db_path = "mnemo.db"
port = 8080

[llm]
provider = "ollama"
base_url = "http://localhost:11434/v1"
model = "llama3"
api_key = "ollama"
timeout_secs = 30
max_retries = 3
max_tokens = 2048
temperature = 0.1

Environment variables take precedence over TOML values. The active config source is reported in GET /healthconfig_source.


Install:

cargo install --path crates/mnemo-cli

Usage:

# Store a memory
mnemo ingest "I use Neovim and prefer dark mode"

# Retrieve relevant context
mnemo search "what editor do I use?"

# List all extracted entities
mnemo entities

# Show entity detail + graph neighbors
mnemo entity <uuid> --neighbors

# List memory chunks
mnemo chunks

# Server health
mnemo health

# Memory statistics
mnemo stats

# Delete everything (prompts for confirmation)
mnemo wipe

# Skip confirmation prompt
mnemo wipe --yes

# Point at a non-default server
mnemo --server http://192.168.1.10:8080 stats

Install:

See sdk/python/README.md for the full API reference.

Async example:

import asyncio
from mnemo import AsyncMnemoClient

async def main():
    async with AsyncMnemoClient() as client:
        await client.ingest(
            "Alice is a principal engineer at Stripe working on payment infrastructure.",
            session_id="session-001",
        )
        context = await client.get_context(
            "what does Alice work on?",
            session_id="session-001",
        )
        print(context)

asyncio.run(main())

A working standalone example: examples/basic_usage.py


Four Rust crates wired together:

Crate Type Role
mnemo-core lib Entity extraction, graph ops, retrieval engine, DB layer
mnemo-api bin Axum REST API — thin handler layer over mnemo-core
mnemo-cli bin CLI tool using blocking reqwest against the API
mnemo-bench bin Performance benchmarks (12 suites)

Full architecture documentation: docs/architecture.md


Benchmarked on Apple M2, SQLite WAL mode, in-memory petgraph. Debug build numbers — release build (--release) is 3–5× faster.

Operation Avg latency Throughput
Entity insert (SQLite) ~0.12 ms ~8,300 ops/s
Entity lookup by ID ~0.08 ms ~12,500 ops/s
Chunk insert ~0.14 ms ~7,100 ops/s
Full-text chunk search ~0.28 ms ~3,500 ops/s
Graph neighbor (depth=1) ~0.21 ms ~4,700 ops/s
Graph neighbor (depth=2) ~0.89 ms ~1,100 ops/s
Full retrieval pipeline ~4.2 ms ~238 ops/s

Run cargo run -p mnemo-bench to benchmark on your hardware.


cargo test --workspace          # run all 122 tests
make coverage                  # HTML coverage report (requires cargo-llvm-cov)
make coverage-summary          # summary to stdout
cd sdk/python && pytest tests/ -v
cargo run -p mnemo-bench                    # all 12 benchmarks
cargo run -p mnemo-bench -- --filter graph  # graph benchmarks only
cargo run -p mnemo-bench -- --json out.json # save results to JSON

Current test counts: 122 Rust tests · 21 Python tests · 12 benchmarks


PRs welcome. Please run make fmt && make lint before submitting. Open an issue first for large changes.

See CONTRIBUTING.md for full setup instructions, code style guide, and how to add a new LLM provider.


MIT — see LICENSE

联系我们 contact @ memedata.com