展示 HN：GoModel – 一个用 Go 编写的开源 AI 网关；比 LiteLLM 轻 44 倍。

展示 HN：GoModel – 一个用 Go 编写的开源 AI 网关；比 LiteLLM 轻 44 倍。
Show HN: GoModel – an open-source AI gateway in Go

原始链接: https://github.com/ENTERPILOT/GOModel/

## GoModel：统一的AI网关 GoModel是一个用Go构建的高性能AI网关，为访问多个AI提供商（包括OpenAI、Anthropic、Gemini、Groq等）提供单个、OpenAI兼容的API。它通过统一的接口简化了与各种LLM的集成。 **主要特性：** * **多提供商支持：** 连接到广泛的AI提供商，配置最少。 * **易于部署：** 通过Docker部署，API密钥通过简单的环境变量配置。 * **OpenAI兼容性：** 使用OpenAI API结构，方便迁移和集成。 * **高级缓存：** 包括两层响应缓存（精确匹配和语义），以降低成本和延迟。 * **安全性：** 支持通过环境变量进行API密钥身份验证（推荐用于生产环境）。 * **监控和管理：** 提供Prometheus指标、审计日志和管理仪表板。 * **直通路由：** 提供对提供商原生API的直接访问。 **入门：** 使用Docker部署，将必要的API密钥作为环境变量提供。设置完成后，可以使用`curl`向`/v1/chat/completions`发出基本的API调用。提供详细的文档和配置选项，包括对`.env`文件的支持，以实现安全的密钥管理。

原文

A high-performance AI gateway written in Go, providing a unified OpenAI-compatible API for OpenAI, Anthropic, Gemini, xAI, Groq, OpenRouter, Z.ai, Azure OpenAI, Oracle, Ollama, and more.

Animated GoModel AI gateway dashboard showing usage analytics, token tracking, and estimated cost monitoring

Quick Start - Deploy the AI Gateway

Step 1: Start GoModel

docker run --rm -p 8080:8080 \
  -e LOGGING_ENABLED=true \
  -e LOGGING_LOG_BODIES=true \
  -e LOG_FORMAT=text \
  -e LOGGING_LOG_HEADERS=true \
  -e OPENAI_API_KEY="your-openai-key" \
  enterpilot/gomodel

Pass only the provider credentials or base URL you need (at least one required):

docker run --rm -p 8080:8080 \
  -e OPENAI_API_KEY="your-openai-key" \
  -e ANTHROPIC_API_KEY="your-anthropic-key" \
  -e GEMINI_API_KEY="your-gemini-key" \
  -e GROQ_API_KEY="your-groq-key" \
  -e OPENROUTER_API_KEY="your-openrouter-key" \
  -e ZAI_API_KEY="your-zai-key" \
  -e XAI_API_KEY="your-xai-key" \
  -e AZURE_API_KEY="your-azure-key" \
  -e AZURE_BASE_URL="https://your-resource.openai.azure.com/openai/deployments/your-deployment" \
  -e AZURE_API_VERSION="2024-10-21" \
  -e ORACLE_API_KEY="your-oracle-key" \
  -e ORACLE_BASE_URL="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/v1" \
  -e ORACLE_MODELS="openai.gpt-oss-120b,xai.grok-3" \
  -e OLLAMA_BASE_URL="http://host.docker.internal:11434/v1" \
  enterpilot/gomodel

⚠️ Avoid passing secrets via -e on the command line - they can leak via shell history and process lists. For production, use docker run --env-file .env to load API keys from a file instead.

Step 2: Make your first API call

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5-chat-latest",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

That's it! GoModel automatically detects which providers are available based on the credentials you supply.

Example model identifiers are illustrative and subject to change; consult provider catalogs for current models. Feature columns reflect gateway API support, not every individual model capability exposed by an upstream provider.

Provider	Credential	Example Model	Chat	`/responses`	Embed	Files	Batches	Passthru
OpenAI	`OPENAI_API_KEY`	`gpt-4o-mini`	✅	✅	✅	✅	✅	✅
Anthropic	`ANTHROPIC_API_KEY`	`claude-sonnet-4-20250514`	✅	✅	❌	❌	✅	✅
Google Gemini	`GEMINI_API_KEY`	`gemini-2.5-flash`	✅	✅	✅	✅	✅	❌
Groq	`GROQ_API_KEY`	`llama-3.3-70b-versatile`	✅	✅	✅	✅	✅	❌
OpenRouter	`OPENROUTER_API_KEY`	`google/gemini-2.5-flash`	✅	✅	✅	✅	✅	✅
Z.ai	`ZAI_API_KEY` (`ZAI_BASE_URL` optional)	`glm-5.1`	✅	✅	✅	❌	❌	✅
xAI (Grok)	`XAI_API_KEY`	`grok-2`	✅	✅	✅	✅	✅	❌
Azure OpenAI	`AZURE_API_KEY` + `AZURE_BASE_URL` (`AZURE_API_VERSION` optional)	`gpt-4o`	✅	✅	✅	✅	✅	✅
Oracle	`ORACLE_API_KEY` + `ORACLE_BASE_URL`	`openai.gpt-oss-120b`	✅	✅	❌	❌	❌	❌
Ollama	`OLLAMA_BASE_URL`	`llama3.2`	✅	✅	✅	❌	❌	❌

✅ Supported ❌ Unsupported

For Z.ai's GLM Coding Plan, set ZAI_BASE_URL=https://api.z.ai/api/coding/paas/v4. For Oracle, set ORACLE_MODELS=openai.gpt-oss-120b,xai.grok-3 when the upstream /models endpoint is unavailable.

Alternative Setup Methods

Prerequisites: Go 1.26.2+

Create a .env file:
Add your API keys to .env (at least one required).
Start the server:

Infrastructure only (Redis, PostgreSQL, MongoDB, Adminer - no image build):

docker compose up -d
# or: make infra

Full stack (adds GoModel + Prometheus; builds the app image):

cp .env.template .env
# Add your API keys to .env
docker compose --profile app up -d
# or: make image

Building the Docker Image Locally

docker build -t gomodel .
docker run --rm -p 8080:8080 --env-file .env gomodel

OpenAI-Compatible API Endpoints

Endpoint	Method	Description
`/v1/chat/completions`	POST	Chat completions (streaming supported)
`/v1/responses`	POST	OpenAI Responses API
`/v1/embeddings`	POST	Text embeddings
`/v1/files`	POST	Upload a file (OpenAI-compatible multipart)
`/v1/files`	GET	List files
`/v1/files/{id}`	GET	Retrieve file metadata
`/v1/files/{id}`	DELETE	Delete a file
`/v1/files/{id}/content`	GET	Retrieve raw file content
`/v1/batches`	POST	Create a native provider batch (OpenAI-compatible schema; inline `requests` supported where provider-native)
`/v1/batches`	GET	List stored batches
`/v1/batches/{id}`	GET	Retrieve one stored batch
`/v1/batches/{id}/cancel`	POST	Cancel a pending batch
`/v1/batches/{id}/results`	GET	Retrieve native batch results when available
`/p/{provider}/...`	GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS	Provider-native passthrough with opaque upstream responses
`/v1/models`	GET	List available models
`/health`	GET	Health check
`/metrics`	GET	Prometheus metrics (when enabled)
`/admin/api/v1/usage/summary`	GET	Aggregate token usage statistics
`/admin/api/v1/usage/daily`	GET	Per-period token usage breakdown
`/admin/api/v1/usage/models`	GET	Usage breakdown by model
`/admin/api/v1/usage/log`	GET	Paginated usage log entries
`/admin/api/v1/audit/log`	GET	Paginated audit log entries
`/admin/api/v1/audit/conversation`	GET	Conversation thread around one audit log entry
`/admin/api/v1/models`	GET	List models with provider type
`/admin/api/v1/models/categories`	GET	List model categories
`/admin/dashboard`	GET	Admin dashboard UI
`/swagger/index.html`	GET	Swagger UI (when enabled)

GoModel is configured through environment variables and an optional config.yaml. Environment variables override YAML values. See .env.template and config/config.example.yaml for the available options.

Key settings:

Variable	Default	Description
`PORT`	`8080`	Server port
`GOMODEL_MASTER_KEY`	(none)	API key for authentication
`ENABLE_PASSTHROUGH_ROUTES`	`true`	Enable provider-native passthrough routes under `/p/{provider}/...`
`ALLOW_PASSTHROUGH_V1_ALIAS`	`true`	Allow `/p/{provider}/v1/...` aliases while keeping `/p/{provider}/...` canonical
`ENABLED_PASSTHROUGH_PROVIDERS`	`openai,anthropic,openrouter,zai`	Comma-separated list of enabled passthrough providers
`STORAGE_TYPE`	`sqlite`	Storage backend (`sqlite`, `postgresql`, `mongodb`)
`METRICS_ENABLED`	`false`	Enable Prometheus metrics
`LOGGING_ENABLED`	`false`	Enable audit logging
`GUARDRAILS_ENABLED`	`false`	Enable the configured guardrails pipeline

Quick Start - Authentication: By default GOMODEL_MASTER_KEY is unset. Without this key, API endpoints are unprotected and anyone can call them. This is insecure for production. Strongly recommend setting a strong secret before exposing the service. Add GOMODEL_MASTER_KEY to your .env or environment for production deployments.

GoModel has a two-layer response cache that reduces LLM API costs and latency for repeated or semantically similar requests.

Layer 1 - Exact-match cache

Hashes the full request body (path + Workflow + body) and returns a stored response on byte-identical requests. Sub-millisecond lookup. Activate by environment variables: RESPONSE_CACHE_SIMPLE_ENABLED and REDIS_URL.

Responses served from this layer carry X-Cache: HIT (exact).

Embeds the last user message via your configured provider’s OpenAI-compatible /v1/embeddings API (cache.response.semantic.embedder.provider must name a key in the top-level providers map) and performs a KNN vector search. Semantically equivalent queries - e.g. "What's the capital of France?" vs "Which city is France's capital?" - can return the same cached response without an upstream LLM call.

Expected hit rates: ~60–70% in high-repetition workloads vs. ~18% for exact-match alone.

Responses served from this layer carry X-Cache: HIT (semantic).

Supported vector backends: qdrant, pgvector, pinecone, weaviate (set cache.response.semantic.vector_store.type and the matching nested block).

Both cache layers run after guardrail/workflow patching so they always see the final prompt. Use Cache-Control: no-cache or Cache-Control: no-store to bypass caching per-request.

See DEVELOPMENT.md for testing, linting, and pre-commit setup.

Join our Discord to connect with other GoModel users.