我对ROCm和Strix Halo的初步印象
My first impressions on ROCm and Strix Halo

原始链接: https://blog.marcoinacio.com/posts/my-first-impressions-rocm-strix-halo/

## ROCm & Strix Halo 初步印象 本总结详细介绍了在 Ubuntu 24.04 上成功设置 ROCm,并与 Strix Halo GPU 实现 128GB CPU/GPU 共享内存。BIOS 更新对于 PyTorch 识别 GPU 至关重要。配置涉及减少保留的显存(低至 512MB)并利用 GTT 实现高效的内存共享,同时为内核稳定性保留 4-12GB。 关键步骤包括使用特定的 `ttm.pages_limit` 和 `amdgpu.gttsize` 值修改 `/etc/default/grub`,然后更新 grub。PyTorch 使用 `uv` 配置,并使用指向 ROCm 特定 PyTorch wheels 的自定义索引。 该设置能够在 Podman 容器内通过 `llama.cpp` 运行 Qwen3.6 模型,利用 GPU 加速。模型下载和转换为 `gguf` 格式也被详细说明。最后,提供了一个 Opencode 配置,用于与本地运行的 Llama.cpp 实例集成。 尽管最初遇到了一些挑战,作者报告了积极的体验,成功利用 PyTorch 并运行具有大量上下文窗口的大型语言模型。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 我对ROCm和Strix Halo的初步印象 (marcoinacio.com) 11点 由 random_ 3小时前 | 隐藏 | 过去 | 收藏 | 讨论 帮助 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

Here I'll share my first impressions with ROCm and Strix Halo and how I've set up everything.

Strix Halo on htop 128GB efficiently shared between the CPU and GPU.

I'm used to working with Ubuntu, so I stuck with it in the supported 24.04 LTS version, and just followed the official installation instructions.

It seems that things wouldn't work without a BIOS update: PyTorch was unable to find the GPU. This was easily done on the BIOS settings: it was able to connect to my Wifi network and download it automatically.

Also on the BIOS settings, you might need to make sure you set the reserved video memory to a low value and let the memory be shared between the CPU and GPU using the GTT. The reserved memory can be as low as 512MB.

Implications:

  • The CPU is not able to use the GPU reserved memory.
  • The GPU can use the total of Reserved + GTT, but utilizing both simultaneously can be less efficient than a single large GTT pool due to fragmentation and addressing overhead.
  • Some legacy games or software sadly might see the GPU memory as 512 MB and refuse to work, this has not happened to me so far though.

Then on /etc/default/grub, I've made this change:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash ttm.pages_limit=32768000 amdgpu.gttsize=114688"

and then ran sudo update-grub.

Note that amdgpu.gttsize shouldn't include the whole system memory, you should leave some memory (I read from 4GB to 12GB) reserved to the CPU (Total memory minus reserved GPU minus GTT) for the sake of the stability of the Linux kernel.

This was somewhat tricky because of the weird dependency graph of PyTorch, but eventually I've got it working with:

[project]
name = "myproject"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
    "torch==2.11.0+rocm7.2",
    "triton-rocm",
]

[tool.uv]
environments = ["sys_platform == 'linux'"]

[[tool.uv.index]]
name = "pytorch-rocm"
url = "https://download.pytorch.org/whl/rocm7.2"
explicit = true

[tool.uv.sources]
torch = { index = "pytorch-rocm" }
torchvision = { index = "pytorch-rocm" }
triton-rocm = { index = "pytorch-rocm" }

and you can even add it this your .bashrc:

alias pytorch='''uvx --extra-index-url https://download.pytorch.org/whl/rocm7.2 \
    --index-strategy unsafe-best-match \
    --with torch==2.11.0+rocm7.2,triton-rocm \
    ipython -c "import torch; print(f\"ROCM: {torch.version.hip}\"); \
    print(f\"GPU available: {torch.cuda.is_available()}\"); import torch.nn as nn" -i
'''
podman run --rm -it --name qwen-coder --device /dev/kfd --device /dev/dri \
--security-opt label=disable --group-add keep-groups -e HSA_OVERRIDE_GFX_VERSION=11.5.0 \
-p 8080:8080 -v /some_path/models:/models:z  ghcr.io/ggml-org/llama.cpp:server-rocm \
-m /models/qwen3.6/model.gguf -ngl 99 -c 327680 --host 0.0.0.0 --port 8080 \
--flash-attn on --no-mmap

Note that you can easily download the model with:

uvx hf download Qwen/Qwen3.6-35B-A3B --local-dir /some_path/models/qwen3.6

And convert to gguf with the convert_hf_to_gguf.py script from the llama.cpp repo:

git clone https://github.com/ggerganov/llama.cpp.git /some_path/llama.cpp
cd /some_path/models/qwen3.6 &&
uvx --extra-index-url https://download.pytorch.org/whl/rocm7.2 \
    --index-strategy unsafe-best-match \
    --with torch==2.11.0+rocm7.2,triton-rocm,transformers \
    ipython /some_path/llama.cpp/convert_hf_to_gguf.py \
    -- . --outfile model.gguf

I'm using a Podman to run Opencode, see my repo on how set it up.

And this is my config to have it work with Llama.cpp:

{
    "$schema": "https://opencode.ai/config.json",
    "provider": {
        "local": {
            "options": {
                "baseURL": "http://localhost:8080/v1",
                "apiKey": "any-string",
                "reasoningEffort": "auto",
                "textVerbosity": "high",
                "supportsToolCalls": true
            },
            "models": {
                "qwen-coder-local": {}
            }
        }
    },
    "model": "local/qwen-coder-local",
    "permission": {
        "*": "ask",
        "read": {
            "*": "allow",
            "*.env": "deny",
            "**/secrets/**": "deny"
        },
        "bash": "allow",
        "edit": "allow",
        "glob": "allow",
        "grep": "allow",
        "websearch": "allow",
        "codesearch": "allow",
        "webfetch": "allow"
    },
    "disabled_providers": [
        "opencode"
    ]
}

So as I promised, my first impressions are: so far, so good, I was able to play with PyTorch and run Qwen3.6 on llama.cpp with a large context window. There were some rough edges, but I think it was quite worth it.

联系我们 contact @ memedata.com