SALOMI is a research repository focused on extreme low-bit transformer quantization and inference, especially the question of whether binary or near-binary weight representations can approach or exceed ternary baselines under realistic evaluation.
This repository contains:
- the
onebit/package for quantization, runtime inference, evaluation, kernels, and related tooling, - a large
tests/tree for validation and experimentation, - research writeups under
docs/, - and historical paper-style materials under
onebit/research/paper/.
This repository is best treated as a research workspace rather than a one-command product package.
Typical setup:
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
pytestNotes:
pyopenclis optional unless you want to explore the OpenCL backend.- some research scripts expect Hugging Face model/data downloads and may require extra environment setup or credentials depending on your machine state.
- for a guided overview, read
RESEARCH.mdbefore running older experiment scripts.
This is a research repository, not a polished production package.
The most important repo-level conclusion is:
- strict 1.00 bpp post-hoc binary quantization does not hold up as a strong GPT-2–class language modeling solution under rigorous evaluation
- more credible practical results in this repo cluster around ~1.2-1.35 bpp using Hessian-guided VQ, mixed precision, or magnitude-recovery methods
RESEARCH.md— comprehensive repo-level research report and maturity assessmentdocs/HONEST_ASSESSMENT.md— strongest reality-check documentdocs/PROJECT_ANALYSIS_SUMMARY.md— validation and failure-mode summarydocs/REPOSITORY_GUIDE.md— curated technical guide to the repositorydocs/ARCHIVE.md— explanation of historical experiment files and namingREPRODUCIBILITY.md— environment and rerun guidanceCONTRIBUTING.md— contribution and repo hygiene expectations
Some materials under onebit/research/paper/ preserve earlier, more optimistic draft claims. For the most defensible current interpretation of the repository, prefer:
over historical paper-draft numbers when they conflict.
This repo has been curated to improve GitHub readiness:
README.mdgives the top-level framingRESEARCH.mdis the comprehensive research reportrequirements.txtdocuments the dependency surface.gitignoreexcludes common local caches and transient filesLICENSEnow provides clear reuse terms under Apache-2.0
This repository is licensed under Apache-2.0. See LICENSE.
SALOMI/
├── README.md
├── RESEARCH.md
├── onebit/
├── docs/
├── tests/
└── research/result artifacts and experiment scripts
The strongest honest framing for this project is:
A serious research and systems exploration of extreme LLM quantization, including both promising methods and rigorous evidence about where naive sub-1-bit claims break down.
Some filenames, especially under onebit/research/, preserve the chronology of the work rather than an ideal public taxonomy. Names like novel_ideas_v*.py are intentionally kept as part of the research trail. Public-facing readers should prioritize the curated documents and validated test paths over historical experiment filenames.
README.mdRESEARCH.mddocs/HONEST_ASSESSMENT.mddocs/PROJECT_ANALYSIS_SUMMARY.mddocs/REPOSITORY_GUIDE.md
If you want the corrected, defensible story of the repo, read in that order before opening the historical paper drafts.