``` 2026年4月 Ollama 和 Gemma 4 26B 在 Mac mini 上的快速设置 ```
April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini

原始链接: https://gist.github.com/greenstevester/fc49b4e60a4fef9effc79066c1033ae5

## 在 Mac 上运行 Gemma 4 26B,使用 Ollama 本指南详细介绍了如何在配备 Apple Silicon (M1/M2/M3/M4/M5) 的 Mac mini 上,使用 Ollama 本地设置和运行 Gemma 4 26B 大型语言模型。 **安装:** 需要至少 24GB 统一内存、macOS 和 Homebrew。通过 `brew install --cask ollama-app` 安装 Ollama。这将提供应用程序和 CLI 工具。初始下载量约为 17GB。使用 `ollama list` 验证安装,并使用 `ollama run gemma4:26b "Hello, what model are you?"` 进行测试。Ollama 自动利用 Apple 的 MLX 框架,在 Apple Silicon 上实现更快的推理,利用 GPU 加速。 **优化:** 为了持续使用,请在 Ollama 菜单栏中启用“登录时启动”。可以创建一个启动代理,将 Gemma 4 预加载到内存中,并将 `OLLAMA_KEEP_ALIVE` 环境变量设置为 `-1`,以防止模型因不活动而卸载。 Ollama v0.19+ 具有改进的缓存,并利用 NVIDIA 的 NVFP4 格式以提高效率。Gemma 4 26B 加载时大约需要 20GB 内存。通过 `http://localhost:11434` 访问本地 API。

## 开源权重模型设置与早期问题 (2026年4月) 这场Hacker News讨论围绕在Mac mini上设置和使用Gemma 4 (26B),并强调了早期采用开源权重模型的常见陷阱。用户报告结果不一,性能因推理引擎、量化和硬件而异。 一个关键要点是,最初的实现通常存在错误。用户应该预计需要频繁更新他们的设置并重新下载量化,因为会发现问题,尤其是在工具调用方面。即使模型可以*加载*,功能也无法保证。 LM Studio和llama.cpp等替代方案经常被提及,一些人由于性能或开源原则而更喜欢它们,而不是Ollama。关于本地模型是否能真正与Claude等付费服务竞争,存在争论,大多数人认为它们还不能完全替代,但对隐私或实验很有用。 对话强调了动手测试的重要性,并避免依赖初始基准或LLM生成的建议,因为形势正在迅速变化。许多用户建议首先尝试托管服务,以评估模型能力,然后再投资本地硬件。
相关文章

原文
  • Mac mini with Apple Silicon (M1/M2/M3/M4/M5)
  • At least 24GB unified memory for Gemma 4 26B
  • macOS with Homebrew installed

Install the Ollama macOS app via Homebrew cask (includes auto-updates and MLX backend):

brew install --cask ollama-app

This installs:

  • Ollama.app in /Applications/
  • ollama CLI at /opt/homebrew/bin/ollama

The Ollama icon will appear in the menu bar. Wait a few seconds for the server to initialize.

Verify it's running:

This downloads ~17GB. Verify:

ollama list
# NAME          ID              SIZE     MODIFIED
# gemma4:26b    5571076f3d70    17 GB    ...
ollama run gemma4:26b "Hello, what model are you?"

Check that it's using GPU acceleration:

ollama ps
# Should show CPU/GPU split, e.g. 14%/86% CPU/GPU

Step 5: Configure Auto-Start on Login

5a. Ollama App — Launch at Login

Click the Ollama icon in the menu bar > Launch at Login (enable it).

Alternatively, go to System Settings > General > Login Items and add Ollama.

5b. Auto-Preload Gemma 4 on Startup

Create a launch agent that loads the model into memory after Ollama starts and keeps it warm:

cat << 'EOF' > ~/Library/LaunchAgents/com.ollama.preload-gemma4.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.ollama.preload-gemma4</string>
    <key>ProgramArguments</key>
    <array>
        <string>/opt/homebrew/bin/ollama</string>
        <string>run</string>
        <string>gemma4:26b</string>
        <string></string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>StartInterval</key>
    <integer>300</integer>
    <key>StandardOutPath</key>
    <string>/tmp/ollama-preload.log</string>
    <key>StandardErrorPath</key>
    <string>/tmp/ollama-preload.log</string>
</dict>
</plist>
EOF

Load the agent:

launchctl load ~/Library/LaunchAgents/com.ollama.preload-gemma4.plist

This sends an empty prompt to ollama run every 5 minutes, keeping the model warm in memory.

5c. Keep Models Loaded Indefinitely

By default, Ollama unloads models after 5 minutes of inactivity. To keep them loaded forever:

launchctl setenv OLLAMA_KEEP_ALIVE "-1"

Then restart Ollama for the change to take effect.

Note: This environment variable is session-scoped. To persist across reboots, add export OLLAMA_KEEP_ALIVE="-1" to your ~/.zshrc, or set it via a dedicated launch agent.

Step 6: Verify Everything Works

# Check Ollama server is running
ollama list

# Check model is loaded in memory
ollama ps

# Check launch agent is registered
launchctl list | grep ollama

Expected output from ollama ps:

NAME          ID              SIZE     PROCESSOR          CONTEXT    UNTIL
gemma4:26b    5571076f3d70    20 GB    14%/86% CPU/GPU    4096       Forever

Ollama exposes a local API at http://localhost:11434. Use it with coding agents:

# Chat completion (OpenAI-compatible)
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma4:26b",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
Command Description
ollama list List downloaded models
ollama ps Show running models & memory usage
ollama run gemma4:26b Interactive chat
ollama stop gemma4:26b Unload model from memory
ollama pull gemma4:26b Update model to latest version
ollama rm gemma4:26b Delete model

Uninstall / Remove Auto-Start

# Remove the preload agent
launchctl unload ~/Library/LaunchAgents/com.ollama.preload-gemma4.plist
rm ~/Library/LaunchAgents/com.ollama.preload-gemma4.plist

# Uninstall Ollama
brew uninstall --cask ollama-app

What's New in Ollama v0.19+ (March 31, 2026)

MLX Backend on Apple Silicon

On Apple Silicon, Ollama automatically uses Apple's MLX framework for faster inference — no manual configuration needed. M5/M5 Pro/M5 Max chips get additional acceleration via GPU Neural Accelerators. M4 and earlier still benefit from general MLX speedups.

Ollama now leverages NVIDIA's NVFP4 format to maintain model accuracy while reducing memory bandwidth and storage requirements for inference workloads. As more inference providers scale inference using NVFP4 format, this allows Ollama users to share the same results as they would in a production environment. It further opens up Ollama to run models optimized by NVIDIA's model optimizer.

Improved Caching for Coding and Agentic Tasks

  • Lower memory utilization: Ollama reuses its cache across conversations, meaning less memory utilization and more cache hits when branching with a shared system prompt — especially useful with tools like Claude Code.
  • Intelligent checkpoints: Ollama stores snapshots of its cache at intelligent locations in the prompt, resulting in less prompt processing and faster responses.
  • Smarter eviction: Shared prefixes survive longer even when older branches are dropped.
  • Memory: Gemma 4 26B uses ~20GB when loaded. On a 24GB Mac mini, this leaves ~4GB for the system — close memory-heavy apps before running.
联系我们 contact @ memedata.com