原文
Popular 230B coder that opts for a classic architecture instead of the newer hybrid-attention ideas.
- Scale
- 230B total, 10B active
- Date
- 2026-02-12
- Decoder type
- Sparse MoE
- Attention
- GQA with QK-Norm
- Key detail
- Deliberately avoids sliding-window or linear-attention hybrids while keeping a 10B active path.