树莓派新AI帽增加8GB内存，用于本地LLM。

树莓派新AI帽增加8GB内存，用于本地LLM。
Raspberry Pi's New AI Hat Adds 8GB of RAM for Local LLMs

原始链接: https://www.jeffgeerling.com/blog/2026/raspberry-pi-ai-hat-2/

## 树莓派发布 AI HAT+ 2：一种利基解决方案树莓派最近发布了 AI HAT+ 2（130 美元），配备 Hailo 10H 芯片和 8GB LPDDR4X 内存，旨在将 AI 处理从 Pi 的主 CPU 卸载。虽然它可以独立运行 LLM，但测试表明 Pi 5 的 CPU 通常*优于* HAT+ 2，原因是功耗限制（3W 与 10W）以及，更重要的是，内存限制——8GB 对于许多中型 LLM 来说不够。 AI HAT+ 2 在计算机视觉方面表现出色，但对于这些任务，它相对于现有的、更便宜的 AI HAT（110 美元）或 AI Camera（70 美元）没有太大优势。由于软件问题，尝试利用其“混合模式”——同时运行视觉和推理——均未成功。最终，AI HAT+ 2 感觉更像是一个将 Hailo 10H 集成到其他设备的开发工具包，而不是面向典型 Pi 用户的即插即用解决方案。虽然在对功耗敏感的应用中具有潜力，但其好处目前仅限于非常具体、利基的使用场景。它凸显了“AI”硬件中的一个常见趋势：硬件在完全可用的软件之前出现。

黑客新闻新的 | 过去 | 评论 | 提问 | 展示 | 工作 | 提交登录树莓派的新AI帽子增加了8GB内存，用于本地LLM (jeffgeerling.com) 15 分，由 ingve 发表于 37 分钟前 | 隐藏 | 过去 | 收藏 | 3 条评论 phito 发表于 2 分钟前 | 下一个 [–] 听起来像某个产品经理只是想强行将AI营销塞到不合适的地方。 agent013 发表于 8 分钟前 | 上一个 | 下一个 [–] 一个很好的例子，说明“可以运行LLM”并不等于“有意义运行LLM”。规格中的数字不能转化为实际用户体验的典型案例。 dwedge 发表于 9 分钟前 | 上一个 [–] > 实际上，它并没有听起来那么令人惊艳。在Pi上使用8GB内存进行AI，即使从标题上看也显得不足。指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

原文

Today Raspberry Pi launched their new $130 AI HAT+ 2 which includes a Hailo 10H and 8 GB of LPDDR4X RAM.

With that, the Hailo 10H is capable of running LLMs entirely standalone, freeing the Pi's CPU and system RAM for other tasks. The chip runs at a maximum of 3W, with 40 TOPS of INT8 NPU inference performance in addition to the equivalent 26 TOPS INT4 machine vision performance on the earlier AI HAT with Hailo 8.

In practice, it's not as amazing as it sounds.

You still can't upgrade the RAM on the Pi, but at least this way if you do have a need for an AI coprocessor, you don't have to eat up the Pi's memory to run things on it.

And it's a lot cheaper and more compact than running an eGPU on a Pi. In that sense, it's more useful than the silly NPUs Microsoft forces into their 'AI PCs'.

But it's still a solution in search of a problem, in all but the most niche of use cases.

Besides feeling like I'm living in the world of the Turbo Encabulator every time I'm testing AI hardware, I find the marketing of these things to be very vague, and the applications not very broad.

For example, the Hailo 10H is advertised as being used for a Fujitsu demo of automatic shrink detection for a self-checkout.

That's certainly not a worthless use case, but it's not something I've ever needed to do. I have a feeling this board is meant more for development, for people who want to deploy the 10H in other devices, rather than as a total solution to problems individual Pi owners need to solve.

Especially when it comes to the headline feature: running inference, like with LLMs.

Video

I also published a video with all the information in this blog post, but if you enjoy text more than video, scroll on past—it doesn't offend me!

LLM performance on the AI HAT+ 2

I ran everything on an 8 gig Pi 5, so I could get an apples-to-apples comparison, running the same models on the Pi's CPU as I did on the AI HAT's NPU.

They both have the same 8GB LPDDR4X RAM configuration, so ideally, they'd have similar performance.

I tested every model Hailo put out so far, and compared them, Pi 5 versus Hailo 10H:

Raspberry Pi AI HAT+ 2 - Inference performance NPU vs CPU

The Pi's built-in CPU trounces the Hailo 10H.

The Hailo is only close, really, on Qwen2.5 Coder 1.5B.

It is slightly more efficient in most cases:

Raspberry Pi AI HAT+ 2 - Inference efficiency NPU vs CPU

But looking more closely at power draw, we can see why the Hailo doesn't keep up:

Raspberry Pi AI HAT+ 2 - Power draw NPU vs CPU

The Pi's CPU is allowed to max out it's power limits (10W on the SoC), which are a lot higher than the Hailo's (3W).

Qwen 30B on a Pi

So power holds it back, but the 8 gigs of RAM holds back the LLM use case (vs just running on the Pi's CPU) the most. The Pi 5 can be bought in up to a 16 GB configuration. That's as much as you get in decent consumer graphics cards^.

Because of that, many quantized medium-size models target 10-12 GB of RAM usage (leaving space for context, which eats up another 2+ GB of RAM).

A couple weeks ago, ByteShape got Qwen3 30B A3B Instruct to fit on a 16GB Pi 5. Now this post isn't about LLMs, but the short of it is they found a novel way to compress the model to fit in 10 GB of RAM.

A little bit of quality is lost, but like a JPEG, it's still good enough to ace all the contrived tests (like building a TODO list app, or sorting a complex list) that the tiny models I ran on the Hailo 10H didn't complete well (see the video earlier in this post for details).

Raspberry Pi 16GB running Qwen3 30B model

To test the 30B model, I installed llama.cpp following this guide from my blog, and downloaded the compressed model.

I asked it to generate a single page TODO list app, and it's still not a speed demon (this is a Pi CPU with LPDDR4x RAM we're talking about), but after a little while, it gave me this:

Raspberry Pi 16GB Qwen3 Generated TODO list app

It met all my requirements:

I can type in as many items as I want
I can drag them around to rearrange them
I can check off items and they go to the bottom of the list...

It's honestly crazy how many small tasks you can do even with free local models... even on a Pi. Natural Language Programming was just a dream back when I started my career.

Besides being angry Google, OpenAI, Anthropic and all these other companies are consuming all the world's money and resources doing this stuff—not to mention destroying the careers of thousands of junior developers—it is kinda neat to see NLP work for very tightly defined examples.

Benchmarking computer vision

But I don't think this HAT is the best choice to run local, private LLMs (at least not as a primary goal).

What it is good for, is vision processing. But the original AI HAT was good for that too!

In my testing, Hailo's hailo-rpi5-examples were not yet updated for this new HAT, and even if I specified the Hailo 10H manually, model files would not load, or I ran into errors once the board was detected.

But Raspberry Pi's models ran, so I tested them with a Camera Module 3:

Raspberry Pi AI HAT+ 2 running YOLO vision model at 30fps

I pointed it over at my desk, and it was able to pick out things like my keyboard, my monitor (which it thought was a TV), my phone, and even the mouse tucked away in the back.

It all ran quite fast—and 10x faster than on the Pi's CPU—but the problem is I can do the same thing with the original AI HAT ($110)—or the AI Camera ($70).

If you just need vision processing, I would stick with one of those.

The headline feature of the AI HAT+ 2 is the ability to run in a 'mixed' mode, where it can process machine vision (frames from a camera or video feed), while also running inference (like an LLM or text-to-speech).

Raspberry Pi AI HAT+ 2 mixed inference and vision not working

Unfortunately, when I tried running two models simultaneously, I ran into segmentation faults or 'device not ready', and lacking any working examples from Hailo, I had to give up on getting that working in time for this post.

Just like the original AI HAT, there's some growing pains.

It seems like with most hardware with "AI" in the name, it's hardware-first, then software comes later—if it comes at all. At least with Raspberry Pi's track record, the software does come, it's just... often the solutions are only useful in tiny niche use cases.

Conclusion

8 GB of RAM is useful, but it's not quite enough to give this HAT an advantage over just paying for the bigger 16GB Pi with more RAM, which will be more flexible and run models faster.

The main use case for this HAT might be in power-constrained applications where you need both vision processing and inferencing. But even there... it's hard to say "yes, buy this thing", because for just a few more watts, the Pi could achieve better performance for inference in tandem with the $70 AI Camera or the $110 AI HAT+ for the vision processing.

Outside of running tiny LLMs in less than 10 watts, maybe the idea is you use the AI HAT+ 2 as a development kit for designing devices using the 10H like self-checkout scanners (which might not even run on a Pi)? I'm not sure.