MDST 引擎：在浏览器中使用 WebGPU/WASM 运行 GGUF 模型

MDST 引擎：在浏览器中使用 WebGPU/WASM 运行 GGUF 模型
MDST Engine: run GGUF models in the browser with WebGPU/WASM

原始链接: https://mdst.app/blog/mdst_engine_run_gguf_models_in_your_browser

## MDST：直接在浏览器中运行本地LLM MDST是一个免费的协作IDE，它将大型语言模型（LLM）的力量直接带到您的网页浏览器中。它利用WebGPU运行GGUF模型——一种流行的、易于下载的格式——在本地运行，从而消除了对云提供商和复杂设置的依赖。这意味着任何拥有现代浏览器（Chrome、Safari、Edge）和相对较新的硬件的人都可以下载、运行甚至*微调* LLM，而无需强大的服务器。MDST提供了一个安全的、端到端加密的环境，用于项目同步、实时协作以及在公共WebGPU排行榜上对模型进行基准测试。目前支持Qwen3、Ministral和Gemma等模型，以及云选项，MDST旨在 democratize LLM访问和研究。用户可以贡献基准测试，赚取研究积分，并帮助塑造项目的未来。MDST有望抓住对可访问、可信赖的本地AI日益增长的需求，使LLM实验和部署比以往任何时候都更加简单。

MDST 引擎：在浏览器中使用 WebGPU/WASM 运行 GGUF 模型 (mdst.app) 3 点赞，vmirnv 发表于 1 小时前 | 隐藏 | 过去 | 收藏 | 1 条评论帮助 vmirnv 发表于 1 小时前 [–] 大家好！以防万一（一个常见问题）：我们计划开源 MDST WebGPU 引擎，但请给我们一些时间来完善它并使其为严苛的开源社区做好准备。与此同时，您可以通过此邀请跳过等待列表：hackernews_Eq1RDox WebGPU 技术使我们能够为所有人提供真正免费且易于访问的用户层级，而无需消耗投资者资金或进行欺诈。如果您有任何问题，我很乐意回答。回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

原文

MDST Research module screenshot — Load, tune, run, and publish your own GGUF models in Chrome, Safari, or Edge.

TLDR: MDST brings GGUF to WebGPU, the most popular format for LLMs, so anyone can create, edit, and review any files and collaborate from their browser without being dependent on cloud LLM providers or complicated setups.

In 2026, more people want local models that they can actually run and trust, and the hardware and software are finally catching up. Better consumer GPUs, new models and better quantizations are making “local” feel normal and accessible as never before.

So we built a WASM/JS engine that can run GGUF on WebGPU. The GGUF format is one of the most popular LLM formats and supports various quantizations. Shipped in a single-file container, it is best for consumer-grade devices and easy to download, cache, and tune.

We believe this will open a new, bigger market for GGUF: fast, local inference for anyone who just wants it to work, right in the browser.

MDST Engine in browser — Local inference in the browser is only going to get faster and more accessible.

What MDST is

MDST is a free, agentic, secure, collaborative IDE, with cloud and local agentic inference integrated directly into the workspaces.

Instead of copying context between tools and teammates, MDST can sync, merge, and store everything inside one or many projects, with shared access to your files, history, and full context for your team, while keeping everything E2E encrypted and GDPR-compliant.

With MDST, you can:

Download and run LLMs in your browser, within a click, no more complicated setups, anyone can do it from any device that supports WebGPU.
Sync projects in real-time, with GitHub or local filesystem, with MDST you'll never lose your work or any changes you made.
Stay stable under load, without getting locked into a single provider's API mood swings or quality downgrades.
Keep files and conversations private, with end-to-end encryption as a first-class default. Signal-style privacy mode included.
Benchmark models where they run, with local runs feeding a public WebGPU leaderboard.

MDST workspace — Earn research points by running and sharing your models.

Research and learning for everyone

From now on, all you need for local inference, LLM learning, and research is a modern browser (Chrome, Safari, Edge supported, Firefox coming soon), a laptop that is five years old or newer (an M1 MacBook Air works well with small models), and a GGUF model.

Open the Research module and run the local benchmark suite across tasks and difficulty levels to test and promote your model or sampling parameters on the public WebGPU leaderboard.

Every run happens on your own machine, in your browser, and produces results that remain comparable over time. The leaderboard ranks models by a weighted score across benchmark tasks and difficulty levels, so higher-difficulty tasks matter more than easy wins:

Disclaimer: WIP. Scores, model sets, and engine tuning and optimization are subject to change.
Some models tend to perform consistently stable across tasks and difficulty levels, while others vary more.

Why now and what's next

Right now MDST already supports these cloud and local LLM families in different quantizations, and we're working on adding more:

Type	Family	Models
Cloud	Claude	Sonnet 4.5, Opus 4.6
Cloud	OpenAI GPT	5.2, 5.1 Mini, Codex 5.2
Cloud	Gemini	3 Pro Preview
Cloud	Kimi	K2
Cloud	DeepSeek	V3.2
Local GGUF	Qwen	3 Thinking
Local GGUF	Ministral	3 Instruct
Local GGUF	LFM	2.5
Local GGUF	Gemma	3 IT

Such a powerful combination gives users and us a lot of flexibility to choose the best model for the task at hand, and learn from experiments with different prompts and quantizations.

WebGPU is finally fast enough on mainstream hardware, and GGUF has become the simplest way to ship a quantized model as a single artifact. Put together, that makes real local inference in the browser practical and accessible for everyone, even on modest hardware.

We welcome everyone to contribute to the project, and to help steer the roadmap. Sign in for free or start as an early supporter, run your favorite GGUF model, and contribute to the benchmark suite or use best cloud models for any of your tasks.

Your feedback and support will show us what to optimize next. Please, share this post with the community — we'll appreciate your support!

Follow our progress on MDST Telegram and X channels.