我对HP Zbook Ultra G1a的初步印象:Ryzen AI Max+ 395,Strix Halo 128GB
My First Impression on HP Zbook Ultra G1a: Ryzen AI Max+ 395, Strix Halo 128GB

原始链接: https://forum.level1techs.com/t/my-first-impression-on-hp-zbook-ultra-g1a-ryzen-ai-max-395-strix-halo-128-gb/232958

## HP Zbook G1a (AI Max+ 395) – 性能总结 使用一个月后,HP Zbook G1a (128GB 内存) 证明了其作为移动工作站的强大竞争力,尤其是在矩阵运算和 FDTD 模拟等内存密集型任务中。AI Max+ 395 APU 在满载时(CPU 或 GPU)初始功耗约为 80W,随后稳定在约 70W,并略微降低时钟频率后降至约 45W。 在 FDTD 计算中——一种受内存带宽限制的工作负载——Zbook 凭借其快速的 256 位 LPDDR5x 8000 MHz 内存,实现了与高端 Threadripper 工作站 80% 的性能。运行本地 LLM (Phi4 推理加 Q8),上下文窗口为 24k,展现了超过 80% 的峰值内存带宽(约 205 GB/s),甚至成功加载了 70B 模型,仅使用了 32GB 专用 GPU 内存。COMSOL 基准测试显示了具有竞争力的性能,峰值读取带宽约为 72GB/s。 然而,Windows 电源管理目前限制了对所有 16 个线程的充分利用;第二个 CPU CCD 除非完全加载,否则将保持非激活状态,需要像 Process Lasso 这样的解决方法。BIOS 中缺少 SMT 禁用选项是这款工作站级设备的一个显著遗漏。

黑客新闻 新的 | 过去的 | 评论 | 提问 | 展示 | 工作 | 提交 登录 我对HP Zbook Ultra G1a的初步印象:Ryzen AI Max+ 395,Strix Halo 128GB (level1techs.com) 19 分,由 teleforce 1天前发布 | 隐藏 | 过去的 | 收藏 | 2 条评论 rubatuga 1天前 | 下一个 [–] SMT可以在Linux引导加载程序中禁用 -nosmt 回复 cranberryturkey 1天前 | 上一个 [还有2条] [已标记] Morthor 1天前 | 父级 [–] 这条评论对对话没有贡献。回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

I got my HP Zbook G1a (395, 128 GB version) a month ago for my research, manipulating big matrices (need large memory capacity) and running FDTD simulations (require large memory bandwidth). For those two primary workloads, I think Strix Halo fits quite well among current laptops in the market.

The following is my short impression on this, focusing on its performance numbers.

OS: Windows 11 Pro 24H2
Power plan: Best performance mode except as noted

0. Note on the power draw of AI Max+ 395 APU on Zbook Ultra G1a

In the Best performance mode, a continuous full CPU load draws peak power ~ 80W, and sustains a 70W draw for a few minutes. Then, the power draw gradually gets down to 45W after about 30 min running, reducing about 10% of all core clock speed from the start.

In the GPU load (like running LLM), the same applies: starts at ~80W, stays at 70W for a while, then gradually goes to 45W.

1. CPU-Z, Cinebench R23, 7-Zip
CPU-Z bench
image

Cinebench R23

7-Zip bench

Fire Strike

Time Spy

2. Home-made FDTD calculation (comparison with CPU workstations)
FDTD is a memory-bandwidth-bound algorithm for numerical simulation of electrodynamics.

Results (steps per sec)

  • AI Max+ 395 (256bit LPDDR5x 8000 MHz): 10.4
  • Epyc 9654 2s (24ch DDR5 4800 MHz): 54.31
  • TR 5995wx (8ch DDR4 3200 MHz): 12.1
  • i9 7920x (4ch DDR4 2933 MHz): 4.49

It is amazing to see that this small laptop gives about 80% performance of TR 5995wx workstation in a memory-bandwidth-bound workload.

3. Local LLM and memory bandwidth

I’m a newbie at running a local LLM. Used LM Studio and just followed the simple instructions to run. So please note that the results could be misleading in some details.

The following is a result of Phi4 reasoning plus Q8 (15.5 GB) model, asked to evaluate an integral by using complex analysis. The context window size was set to be 24k, and Vulcan was used to run on the GPU. (The integral is quite tricky, though the answer is correct. Amazing. :astonished:)

I heard the memory bandwidth matters in LLM, and this laptop gives a 205 GB/s reading bandwidth while running the LLM, which is more than 80% of the theoretical peak.

One interesting thing is that, in my experience, setting a large dedicated GPU memory is not quite important. The laptop was able to load llama 3.3 70B Q8 (~75GB) with just 32GB of dedicated GPU memory. The rest of the data was loaded on the “shared” GPU memory. The same memory bandwidth (~200 GB/s) was observed in this case also.

4. COMSOL Multiphysics

For benchmark details, you can refer to the following topic.

I have run the CFD-only model, and here are the results.

  • 36m 48s (-np 16)
  • 35m 56s (-np 16 -blas aocl)

During the benchmark, the peak memory bandwidth was observed as ~72GB/s for reading.

5. Things that make performance-squeezing-out tricky on Windows

  • Regardless of the power plan, the second CCD remains parked by default—even when running on AC power—and it doesn’t wake up unless all 16 threads (8 cores + 8 SMT) are fully utilized. As a result, if you run a 16-threaded program, the second CCD won’t be activated. I’m not sure whether this behavior is controlled by AMD or HP, but I hope this policy will be changed later.

  • So, to make use of 16 threads across the two CCDs while running the COMSOL benchmark, I had to use Process Lasso to manually wake up the second CCD.

  • It would be best if HP provided an option to disable SMT in the BIOS, but I could not find it. Considering this laptop is intended for workstation use, I think this is more or less disappointing.

联系我们 contact @ memedata.com