在本地运行 Google Gemma 4，使用 LM Studio 的新无头 CLI 和 Claude Code。

在本地运行 Google Gemma 4，使用 LM Studio 的新无头 CLI 和 Claude Code。
Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code

原始链接: https://ai.georgeliu.com/p/running-google-gemma-4-locally-with

## Gemma 4：强大的本地LLM推理云端AI API 提供了便利，但也伴随着成本、隐私问题和限制。在本地运行大型语言模型 (LLM) 提供了引人注目的替代方案，适用于代码审查和测试等任务，具有零API成本、数据隐私和持续可用性等优势。谷歌的 Gemma 4，特别是 26B-A4B 模型，由于其高效的混合专家 (MoE) 架构，非常适合这一点。这种架构在推理过程中仅激活其参数的一小部分，使其能够在消费级硬件上有效运行——在配备 48GB 统一内存的 MacBook Pro 上达到 51 个 token/秒。Gemma 4 有一系列模型，其中 26B-A4B 在性能（在基准测试中接近 31B 密集模型）和资源使用之间取得了平衡。最近的 LM Studio 更新 (v0.4.0) 启用了无头操作，通过命令行界面实现，使本地 LLM 服务更加灵活。用户甚至可以将 Claude Code 别名为 Gemma 4 本地运行，从而提供离线编码辅助。虽然 Gemma 4 不能完全替代云端 API，但它为本地推理提供了一个强大且私密的解决方案，尤其适用于专注的任务，并展示了 MoE 模型在可访问 AI 方面的潜力。

## 本地LLM推理获得进展：Gemma 4 及新工具最近的进展使本地运行大型语言模型（LLM）变得更加实用和有吸引力。Gemma 4 的发布，以及 LM Studio 的无头 CLI 和 Claude Code 等工具，正在改变这一领域。用户报告说，本地模型终于“好用”了，不再局限于简单的演示，而是可以集成到实际工具中。一个关键的进展是将编码代理（如 Claude Code、OpenCode）与底层模型分离。这允许用户轻松地在本地模型和云端模型之间切换，利用本地选项的成本效益和隐私性，以及云服务的强大功能。讨论强调了低延迟对于这些代理内有效使用工具的重要性——低于 300 毫秒的响应时间至关重要。缓存和高效的数据处理是关键的优化手段。虽然 Claude Code 仍然受欢迎，但 OpenCode 和 Pi 等替代方案正在获得进展，提供更大的灵活性和与各种后端兼容性。能够在消费级硬件上运行 Qwen3.5 等模型，即使使用 MoE 和卸载到 RAM 等技术，也在扩大对强大 AI 功能的访问。然而，性能可能会因严重依赖 RAM 或磁盘交换而产生的 I/O 瓶颈而受到显著影响。

Cloud AI APIs are great until they are not. Rate limits, usage costs, privacy concerns, and network latency all add up. For quick tasks like code review, drafting, or testing prompts, a local model that runs entirely on your hardware has real advantages: zero API costs, no data leaving your machine, and consistent availability.

Google’s Gemma 4 is interesting for local use because of its mixture-of-experts architecture. The 26B parameter model only activates 4B parameters per forward pass, which means it runs well on hardware that could never handle a dense 26B model. On my 14” MacBook Pro M4 Pro with 48 GB of unified memory, it fits comfortably and generates at 51 tokens per second. Though there’s significant slowdowns when used within Claude Code from my experience.

在本地运行 Google Gemma 4，使用 LM Studio 的新无头 CLI 和 Claude Code。 Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code

在本地运行 Google Gemma 4，使用 LM Studio 的新无头 CLI 和 Claude Code。
Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code