三元盆景:1.58比特的顶级智能
Ternary Bonsai: Top Intelligence at 1.58 Bits

原始链接: https://prismml.com/news/ternary-bonsai

## 三元盆景:高性能、低内存语言模型 PrismML 发布了三元盆景,这是一种新型的 1.58 位语言模型系列(8B、4B 和 1.7B 参数),旨在实现高效性能。在他们之前的 1 位盆景模型的基础上,三元盆景通过利用三元权重(-1、0、+1)在内存使用和准确性之间取得了平衡。 这些模型比标准的 16 位模型内存占用小约 9 倍,同时在 MMLU、GSM8K 和 HumanEval+ 等关键基准测试中*优于*许多同等参数的模型。例如,8B 模型实现了 75.5 的平均基准分数,超越了 1 位盆景 8B,并与更大的模型(如 Qwen3 8B)竞争,尽管其尺寸明显更小。 三元盆景还提供令人印象深刻的吞吐量和能源效率,在 M4 Pro 和 iPhone 17 Pro Max 等平台上,运行速度比 16 位模型快高达 5 倍,并且能耗降低 3-4 倍。这些模型通过 MLX 在 Apple 设备上原生支持,并采用 Apache 2.0 许可证提供。它们代表了性能-尺寸权衡的转变,与 1 位系列相比,在内存略有增加的情况下,提供了一个更强大的模型。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 三元盆景:1.58 比特顶级智能 (prismml.com) 6 分,由 nnx 1 小时前发布 | 隐藏 | 过去 | 收藏 | 1 条评论 帮助 wmf 7 分钟前 [–] 他们又一次与未量化的其他模型进行比较。他们可能仍然会赢,但尺寸优势会小得多。回复 考虑申请 YC 2026 年夏季批次!申请截止至 5 月 4 日 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

Today, we’re announcing Ternary Bonsai, a new family of 1.58-bit language models designed to balance strict memory constraints with high accuracy requirements.

This release builds on the efficiency frontier we began exploring with the recently released 1-bit Bonsai models. The 1-bit family showed that extreme compression could still produce commercially useful language models. Ternary Bonsai targets a different point on that curve: a modest increase in size for a meaningful gain in performance.

The models are available in three sizes: 8B, 4B, and 1.7B parameters. By using ternary weights {-1, 0, +1}, these models achieve a memory footprint approximately 9x smaller than standard 16-bit models while outperforming most peers in their respective parameter classes on standard benchmarks.

A true ternary model

Ternary Bonsai implements 1.58-bit representation throughout the entire network architecture. There are no higher-precision escape hatches. Embeddings, attention layers, MLPs, and the LM head all use the same 1.58-bit representation.

The models employ a group-wise quantization scheme in which each weight is constrained to one of three values: {-s, 0, +s}. These three states are encoded as (-1, 0, +1) using 1.58 bits per weight, together with a shared FP16 scale factor (s) for each group of 128 weights.

Benchmark performance

Compared to the 1-bit Bonsai 8B, the Ternary Bonsai 8B scores 5 points higher on average across benchmarks, while requiring only 600MB more memory.

Ternary Bonsai 8B (1.75 GB) reaches 75.5 average benchmark score, compared with 70.5 for 1-bit Bonsai 8B (1.15 GB). Among its peers, it is only behind Qwen3 8B (16.38 GB) and outperforms all other models, despite being 9-10x smaller than them.  It posts competitive results across MMLU Redux, MuSR, GSM8K, HumanEval+, IFEval, and BFCLv3, showing that the gain is broad rather than concentrated in a single benchmark.

Fig I: The benchmark scores of Ternary Bonsai 8B compared to other models in the same parameter class.

The intelligence density of Ternary Bonsai models continue to significantly outperform other models in their comparable parameter classes.

Fig II: Intelligence density (per GB) of Ternary Bonsai 8B compared to other models in the same parameter class.

Extending the Pareto frontier

Fig III: Performance vs size (log scale) comparison of the 1-bit Bonsai family relative to models across multiple size classes.

Our earlier 1-bit Bonsai models established a new Pareto frontier for language model capability versus size. Ternary Bonsai shifts that frontier even further left.

That makes it a useful addition to the Bonsai family, and not a replacement for 1-bit Bonsai. In settings where the smallest possible footprint is the priority, 1-bit remains the right choice. However, where a small increase in memory can justify a substantially stronger model, Ternary Bonsai offers an alternative tradeoff. The 1.7B, 4B, and 8B variants extend that tradeoff across multiple deployment tiers, giving developers more flexibility in how they allocate memory, throughput, and model quality.

Throughput and energy use

Fig IV: Throughput (toks/sec) and energy consumption (mWh/tok) across various hardware platforms.

The new models also deliver strong throughput in practice. On M4 Pro, Ternary Bonsai 8B runs at 82 toks/sec, roughly 5x faster than a 16-bit 8B model and on iPhone 17 Pro Max, it runs at 27 toks/sec. They use substantially less energy than their 16-bit full-precision counterparts, delivering roughly 3-4x better energy efficiency. On the M4 Pro, Ternary Bonsai 8B requires 0.105 mWh/tok and on the iPhone 17 Pro Max, it only requires 0.132 mWh/tok.

Platform Coverage

Ternary Bonsai models run natively on Apple devices (Mac, iPhone, iPad) via MLX. Model weights are available today under the Apache 2.0 License.

Full technical details of our training, evaluation, and benchmarking processes are available in our whitepaper.

Join Us

PrismML emerged from a team of Caltech researchers and was founded with support from Khosla Ventures, Cerberus and Google. We’ve spent years tackling one of the field’s hardest problems: compressing neural networks without sacrificing their reasoning ability.

If you want to help build the next generation of state-of-the-art AI, we’d love to hear from you. Check out our careers page.

联系我们 contact @ memedata.com