展示 HN:GoModel – 一个用 Go 编写的开源 AI 网关;比 LiteLLM 轻 44 倍。
Show HN: GoModel – an open-source AI gateway in Go

原始链接: https://github.com/ENTERPILOT/GOModel/

## GoModel:统一的AI网关 GoModel是一个用Go构建的高性能AI网关,为访问多个AI提供商(包括OpenAI、Anthropic、Gemini、Groq等)提供单个、OpenAI兼容的API。它通过统一的接口简化了与各种LLM的集成。 **主要特性:** * **多提供商支持:** 连接到广泛的AI提供商,配置最少。 * **易于部署:** 通过Docker部署,API密钥通过简单的环境变量配置。 * **OpenAI兼容性:** 使用OpenAI API结构,方便迁移和集成。 * **高级缓存:** 包括两层响应缓存(精确匹配和语义),以降低成本和延迟。 * **安全性:** 支持通过环境变量进行API密钥身份验证(推荐用于生产环境)。 * **监控和管理:** 提供Prometheus指标、审计日志和管理仪表板。 * **直通路由:** 提供对提供商原生API的直接访问。 **入门:** 使用Docker部署,将必要的API密钥作为环境变量提供。设置完成后,可以使用`curl`向`/v1/chat/completions`发出基本的API调用。提供详细的文档和配置选项,包括对`.env`文件的支持,以实现安全的密钥管理。

## GoModel:轻量级AI网关 Jakub,一位独立开发者,推出了新的开源AI网关GoModel,使用Go语言构建。GoModel旨在位于应用程序和AI提供商(如OpenAI和Anthropic)之间,解决使用量跟踪、轻松的模型切换、调试以及通过缓存降低成本等关键需求。 其主要优势在于极小的体积——17MB的Docker镜像,远小于LiteLLM(746MB)等替代方案。GoModel优先考虑可见性,提供可检查的请求流程,并采用环境变量优先的配置方式。 此次发布部分源于近期LiteLLM的安全问题,为用户提供了一种替代方案。目前可在[https://gomodel.enterpilot.io](https://gomodel.enterpilot.io)获取,开发者欢迎反馈。有用户询问它与另一个Go路由器Bifrost的比较。
相关文章

原文

CI Docs Discord Docker Pulls Go Version

A high-performance AI gateway written in Go, providing a unified OpenAI-compatible API for OpenAI, Anthropic, Gemini, xAI, Groq, OpenRouter, Z.ai, Azure OpenAI, Oracle, Ollama, and more.

Animated GoModel AI gateway dashboard showing usage analytics, token tracking, and estimated cost monitoring

Quick Start - Deploy the AI Gateway

Step 1: Start GoModel

docker run --rm -p 8080:8080 \
  -e LOGGING_ENABLED=true \
  -e LOGGING_LOG_BODIES=true \
  -e LOG_FORMAT=text \
  -e LOGGING_LOG_HEADERS=true \
  -e OPENAI_API_KEY="your-openai-key" \
  enterpilot/gomodel

Pass only the provider credentials or base URL you need (at least one required):

docker run --rm -p 8080:8080 \
  -e OPENAI_API_KEY="your-openai-key" \
  -e ANTHROPIC_API_KEY="your-anthropic-key" \
  -e GEMINI_API_KEY="your-gemini-key" \
  -e GROQ_API_KEY="your-groq-key" \
  -e OPENROUTER_API_KEY="your-openrouter-key" \
  -e ZAI_API_KEY="your-zai-key" \
  -e XAI_API_KEY="your-xai-key" \
  -e AZURE_API_KEY="your-azure-key" \
  -e AZURE_BASE_URL="https://your-resource.openai.azure.com/openai/deployments/your-deployment" \
  -e AZURE_API_VERSION="2024-10-21" \
  -e ORACLE_API_KEY="your-oracle-key" \
  -e ORACLE_BASE_URL="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/v1" \
  -e ORACLE_MODELS="openai.gpt-oss-120b,xai.grok-3" \
  -e OLLAMA_BASE_URL="http://host.docker.internal:11434/v1" \
  enterpilot/gomodel

⚠️ Avoid passing secrets via -e on the command line - they can leak via shell history and process lists. For production, use docker run --env-file .env to load API keys from a file instead.

Step 2: Make your first API call

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5-chat-latest",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

That's it! GoModel automatically detects which providers are available based on the credentials you supply.

Example model identifiers are illustrative and subject to change; consult provider catalogs for current models. Feature columns reflect gateway API support, not every individual model capability exposed by an upstream provider.

Provider Credential Example Model Chat /responses Embed Files Batches Passthru
OpenAI OPENAI_API_KEY gpt-4o-mini
Anthropic ANTHROPIC_API_KEY claude-sonnet-4-20250514
Google Gemini GEMINI_API_KEY gemini-2.5-flash
Groq GROQ_API_KEY llama-3.3-70b-versatile
OpenRouter OPENROUTER_API_KEY google/gemini-2.5-flash
Z.ai ZAI_API_KEY (ZAI_BASE_URL optional) glm-5.1
xAI (Grok) XAI_API_KEY grok-2
Azure OpenAI AZURE_API_KEY + AZURE_BASE_URL (AZURE_API_VERSION optional) gpt-4o
Oracle ORACLE_API_KEY + ORACLE_BASE_URL openai.gpt-oss-120b
Ollama OLLAMA_BASE_URL llama3.2

✅ Supported ❌ Unsupported

For Z.ai's GLM Coding Plan, set ZAI_BASE_URL=https://api.z.ai/api/coding/paas/v4. For Oracle, set ORACLE_MODELS=openai.gpt-oss-120b,xai.grok-3 when the upstream /models endpoint is unavailable.


Alternative Setup Methods

Prerequisites: Go 1.26.2+

  1. Create a .env file:

  2. Add your API keys to .env (at least one required).

  3. Start the server:

Infrastructure only (Redis, PostgreSQL, MongoDB, Adminer - no image build):

docker compose up -d
# or: make infra

Full stack (adds GoModel + Prometheus; builds the app image):

cp .env.template .env
# Add your API keys to .env
docker compose --profile app up -d
# or: make image

Building the Docker Image Locally

docker build -t gomodel .
docker run --rm -p 8080:8080 --env-file .env gomodel

OpenAI-Compatible API Endpoints

Endpoint Method Description
/v1/chat/completions POST Chat completions (streaming supported)
/v1/responses POST OpenAI Responses API
/v1/embeddings POST Text embeddings
/v1/files POST Upload a file (OpenAI-compatible multipart)
/v1/files GET List files
/v1/files/{id} GET Retrieve file metadata
/v1/files/{id} DELETE Delete a file
/v1/files/{id}/content GET Retrieve raw file content
/v1/batches POST Create a native provider batch (OpenAI-compatible schema; inline requests supported where provider-native)
/v1/batches GET List stored batches
/v1/batches/{id} GET Retrieve one stored batch
/v1/batches/{id}/cancel POST Cancel a pending batch
/v1/batches/{id}/results GET Retrieve native batch results when available
/p/{provider}/... GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS Provider-native passthrough with opaque upstream responses
/v1/models GET List available models
/health GET Health check
/metrics GET Prometheus metrics (when enabled)
/admin/api/v1/usage/summary GET Aggregate token usage statistics
/admin/api/v1/usage/daily GET Per-period token usage breakdown
/admin/api/v1/usage/models GET Usage breakdown by model
/admin/api/v1/usage/log GET Paginated usage log entries
/admin/api/v1/audit/log GET Paginated audit log entries
/admin/api/v1/audit/conversation GET Conversation thread around one audit log entry
/admin/api/v1/models GET List models with provider type
/admin/api/v1/models/categories GET List model categories
/admin/dashboard GET Admin dashboard UI
/swagger/index.html GET Swagger UI (when enabled)

GoModel is configured through environment variables and an optional config.yaml. Environment variables override YAML values. See .env.template and config/config.example.yaml for the available options.

Key settings:

Variable Default Description
PORT 8080 Server port
GOMODEL_MASTER_KEY (none) API key for authentication
ENABLE_PASSTHROUGH_ROUTES true Enable provider-native passthrough routes under /p/{provider}/...
ALLOW_PASSTHROUGH_V1_ALIAS true Allow /p/{provider}/v1/... aliases while keeping /p/{provider}/... canonical
ENABLED_PASSTHROUGH_PROVIDERS openai,anthropic,openrouter,zai Comma-separated list of enabled passthrough providers
STORAGE_TYPE sqlite Storage backend (sqlite, postgresql, mongodb)
METRICS_ENABLED false Enable Prometheus metrics
LOGGING_ENABLED false Enable audit logging
GUARDRAILS_ENABLED false Enable the configured guardrails pipeline

Quick Start - Authentication: By default GOMODEL_MASTER_KEY is unset. Without this key, API endpoints are unprotected and anyone can call them. This is insecure for production. Strongly recommend setting a strong secret before exposing the service. Add GOMODEL_MASTER_KEY to your .env or environment for production deployments.


GoModel has a two-layer response cache that reduces LLM API costs and latency for repeated or semantically similar requests.

Layer 1 - Exact-match cache

Hashes the full request body (path + Workflow + body) and returns a stored response on byte-identical requests. Sub-millisecond lookup. Activate by environment variables: RESPONSE_CACHE_SIMPLE_ENABLED and REDIS_URL.

Responses served from this layer carry X-Cache: HIT (exact).

Embeds the last user message via your configured provider’s OpenAI-compatible /v1/embeddings API (cache.response.semantic.embedder.provider must name a key in the top-level providers map) and performs a KNN vector search. Semantically equivalent queries - e.g. "What's the capital of France?" vs "Which city is France's capital?" - can return the same cached response without an upstream LLM call.

Expected hit rates: ~60–70% in high-repetition workloads vs. ~18% for exact-match alone.

Responses served from this layer carry X-Cache: HIT (semantic).

Supported vector backends: qdrant, pgvector, pinecone, weaviate (set cache.response.semantic.vector_store.type and the matching nested block).

Both cache layers run after guardrail/workflow patching so they always see the final prompt. Use Cache-Control: no-cache or Cache-Control: no-store to bypass caching per-request.


See DEVELOPMENT.md for testing, linting, and pre-commit setup.


Join our Discord to connect with other GoModel users.

Star History Chart

联系我们 contact @ memedata.com