💫 Project Page | Models & Bench 🤗 🤖 | 🚀 Demo | 📚 Cookbooks
- [2026.02.15] 🔥🔥 Release our Technical Report !!
- [2026.02.09] 🔥🔥 Release our code and model checkpoints!!
We present RynnBrain, an embodied foundation model grounded in physical reality. RynnBrain is available in two dense variants (2B and 8B) and one mixture-of-experts (MoE) model (30B-A3B). In addition, we release three post‑trained models: RynnBrain‑Plan (robot task planning), RynnBrain‑Nav (vision-language navigation), and RynnBrain‑CoP (chain-of-point reasoning).
- Comprehensive egocentric understanding: Excels in fine-grained video understanding and egocentric cognition, covering tasks such as embodied QA, counting, and OCR.
- Diverse spatio-temporal localization: Possesses powerful localization capabilities across episodic memory, enabling precise identification of objects, target areas, and motion trajectories.
- Physical-space reasoning: Employs an interleaved reasoning strategy that alternates between textual and spatial grounding, ensuring that its reasoning processes are firmly rooted in the physical environment.
- Physics-aware precise planning: Integrates located affordances and object information into planning, enabling downstream VLA models to execute intricate tasks with fine-grained instructions.
RynnBrain employs a unified encoder-decoder architecture (supporting both Dense and MoE variants) to transform omni-vision inputs and textual instructions into multi-modal outputs, including spatial trajectories, physical pointing, and action planning. Through massive training on rich spatio-temporal, physical-space, and general knowledge data, RynnBrain maintains robust general-purpose capabilities while specializing in diverse, fine-grained embodied reasoning and complex planning tasks.
- General Embodied Understanding
- Vision-Language Navigation
| Model | Base Model | HuggingFace | ModelScope |
|---|---|---|---|
| RynnBrain-2B | Qwen3-VL-2B-Instruct | Link | Link |
| RynnBrain-8B | Qwen3-VL-8B-Instruct | Link | Link |
| RynnBrain-30B-A3B | Qwen3-VL-30B-A3B-Instruct | Link | Link |
| RynnBrain‑CoP-8B | RynnBrain-8B | Link | Link |
| RynnBrain‑Plan-8B | RynnBrain-8B | Link | Link |
| RynnBrain‑Plan-30B-A3B | RynnBrain-30B-A3B | Link | Link |
| RynnBrain‑Nav-8B | RynnBrain-8B | Link | Link |
Minimal dependencies:
pip install transformers==4.57.1Run text generation:
from transformers import AutoModelForImageTextToText
model = AutoModelForImageTextToText.from_pretrained("")
...Checkout the cookbooks that showcase RynnBrain's capabilities in cognition, localization, reasoning, and planning.
Pretraining & Evaluation
Please refer to RynnScale for details of pretraining and evaluation.
Finetuning
-
Reasoning: RynnBrain introduces an interleaved reasoning approach that combines grounding with textual information directly within egocentric video streams. This paradigm effectively bridges the cognitive gap between language and the physical world, ensuring the reasoning process is robustly anchored in reality.
-
Navigation: We trained a vision-language navigation model based on the RynnBrain base model. Empirical evaluation demonstrates that fine-tuning the vision-language model on RynnBrain yields superior performance compared to fine-tuning on other foundational models.
-
Planning: RynnBrain integrates the location information of affordance, areas, and objects directly into its planning outputs. Consequently, even highly intricate and fine-grained tasks can be effectively addressed within our hierarchical RynnBrain-VLA system architecture.
We introduce RynnBrain-Bench, a high-dimensional benchmark for embodied understanding that evaluates models across four key dimensions: object cognition, spatial cognition, grounding, and pointing—highlighting fine-grained understanding and spatiotemporal localization across episodic video sequences.
For details, please refer to RynnBrain-Bench.
💡 Some other multimodal-LLM projects from our team may interest you ✨.
RynnEC: Bringing MLLMs into Embodied World
Ronghao Dang*, Yuqian Yuan*, Yunxuan Mao*, Kehan Li*, Jiangpin Liu, Zhikai Wang, Fan Wang, Deli Zhao, Xin Li![]()
![]()
![]()
RynnScale
RynnScale Team![]()
![]()
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation
Yuming Jiang, Siteng Huang, Shengke Xue, Yaxi Zhao, Jun Cen, Sicong Leng, Kehan Li, Jiayan Guo, Kexiang Wang, Mingxiu Chen, Fan Wang, Deli Zhao, Xin Li![]()
![]()
![]()
RynnVLA-002: A Unified Vision-Language-Action and World Model
Jun Cen, Siteng Huang, Yuqian Yuan, Kehan Li, Hangjie Yuan, Chaohui Yu, Yuming Jiang, Jiayan Guo, Xin Li, Hao Luo, Fan Wang, Deli Zhao, Hao Chen![]()
![]()
![]()
RynnRCP: Open Robotics Context Protocol and RobotMotion
RynnBot Team![]()
![]()
RynnMotion: All-In-One Toolkit for Fast Robot Prototyping and Heterogeneous Teleoperation
RynnBot Team![]()
![]()
Our RynnBrain is built on top of Qwen3-VL. We also learned a lot from the implementation of RynnEC and VideoRefer. If your work is used in RynnBrain but not mentioned in either this repo or the technical report, feel free to let us know ❤️.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.







