萨洛米，一个关于极低比特Transformer量化的研究仓库。

萨洛米，一个关于极低比特Transformer量化的研究仓库。
Salomi, a research repo on extreme low-bit transformer quantization

原始链接: https://github.com/OrionsLock/SALOMI

SALOMI 是一个研究存储库，调查极低比特的 Transformer 量化，具体研究二进制或近二进制权重表示是否能与三元方法相媲美。它提供量化、推理、评估和实验工具，但旨在作为研究工作区，而非即用型软件包。主要发现表明，严格的 1 比特事后量化对于 GPT-2 级别的语言建模是不可行的；使用诸如 Hessian 引导的向量量化等技术，略高的比特率（~1.2-1.35 bpp）可以产生更实用的结果。该存储库包含广泛的文档——特别是 `RESEARCH.md`，提供全面的概述，以及 `HONEST_ASSESSMENT.md`，对结果进行现实评估。存在历史实验文件，但建议用户优先考虑策划的文档和验证的测试，以便最准确地理解项目的当前结论。代码采用 Apache-2.0 许可。

对不起。

原文

SALOMI is a research repository focused on extreme low-bit transformer quantization and inference, especially the question of whether binary or near-binary weight representations can approach or exceed ternary baselines under realistic evaluation.

This repository contains:

the onebit/ package for quantization, runtime inference, evaluation, kernels, and related tooling,
a large tests/ tree for validation and experimentation,
research writeups under docs/,
and historical paper-style materials under onebit/research/paper/.

This repository is best treated as a research workspace rather than a one-command product package.

Typical setup:

python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
pytest

Notes:

pyopencl is optional unless you want to explore the OpenCL backend.
some research scripts expect Hugging Face model/data downloads and may require extra environment setup or credentials depending on your machine state.
for a guided overview, read RESEARCH.md before running older experiment scripts.

This is a research repository, not a polished production package.

The most important repo-level conclusion is:

strict 1.00 bpp post-hoc binary quantization does not hold up as a strong GPT-2–class language modeling solution under rigorous evaluation
more credible practical results in this repo cluster around ~1.2-1.35 bpp using Hessian-guided VQ, mixed precision, or magnitude-recovery methods

RESEARCH.md — comprehensive repo-level research report and maturity assessment
docs/HONEST_ASSESSMENT.md — strongest reality-check document
docs/PROJECT_ANALYSIS_SUMMARY.md — validation and failure-mode summary
docs/REPOSITORY_GUIDE.md — curated technical guide to the repository
docs/ARCHIVE.md — explanation of historical experiment files and naming
REPRODUCIBILITY.md — environment and rerun guidance
CONTRIBUTING.md — contribution and repo hygiene expectations

Some materials under onebit/research/paper/ preserve earlier, more optimistic draft claims. For the most defensible current interpretation of the repository, prefer:

over historical paper-draft numbers when they conflict.

What Makes This Public-Ready

This repo has been curated to improve GitHub readiness:

README.md gives the top-level framing
RESEARCH.md is the comprehensive research report
requirements.txt documents the dependency surface
.gitignore excludes common local caches and transient files
LICENSE now provides clear reuse terms under Apache-2.0

This repository is licensed under Apache-2.0. See LICENSE.

SALOMI/
├── README.md
├── RESEARCH.md
├── onebit/
├── docs/
├── tests/
└── research/result artifacts and experiment scripts

The strongest honest framing for this project is:

A serious research and systems exploration of extreme LLM quantization, including both promising methods and rigorous evidence about where naive sub-1-bit claims break down.

Some filenames, especially under onebit/research/, preserve the chronology of the work rather than an ideal public taxonomy. Names like novel_ideas_v*.py are intentionally kept as part of the research trail. Public-facing readers should prioritize the curated documents and validated test paths over historical experiment filenames.

萨洛米，一个关于极低比特Transformer量化的研究仓库。 Salomi, a research repo on extreme low-bit transformer quantization

What Makes This Public-Ready

Recommended Reading Order

萨洛米，一个关于极低比特Transformer量化的研究仓库。
Salomi, a research repo on extreme low-bit transformer quantization