Can I Run AI locally?

Meta · 8B

Meta's versatile 8B — great quality/speed ratio

4.1 GB · 128K ctx ·

Qwen 3.5 9B

Alibaba · 9B

Multimodal Qwen 3.5 mid-size

4.6 GB · 32K ctx ·

Phi-4 14B

Microsoft · 14B

Microsoft's reasoning-focused model

7.2 GB · 16K ctx ·

GPT-OSS 20B

7mo ago

OpenAI · 21B

OpenAI's open-weight MoE with configurable reasoning

10.8 GB · 128K ctx ·

Mistral Small 3.1 24B

Mistral AI · 24B

Multimodal Mistral with vision support

12.3 GB · 128K ctx ·

Gemma 3 27B

Google · 27B

Google's flagship Gemma 3 model

13.8 GB · 128K ctx ·

Qwen 2.5 Coder 32B

Alibaba · 32B

Best open-source coding model at release

Qwen 3 32B

Alibaba · 32B

Qwen 3 flagship dense model

DeepSeek R1 Distill 32B

DeepSeek · 32B

R1 reasoning distilled into Qwen 32B — sweet spot

16.4 GB · 64K ctx ·

Llama 3.3 70B

Meta · 70B

Best open model at 70B class

35.9 GB · 128K ctx ·

Llama 4 Scout 17B

Meta · 109B

MoE with 16 experts, 17B active params

55.8 GB · 128K ctx ·

GPT-OSS 120B

7mo ago

OpenAI · 117B

OpenAI's flagship open-weight MoE — 52.6% SWE-bench

59.9 GB · 128K ctx ·

Devstral 2 123B

3mo ago

Mistral AI · 123B

Dense 123B coding model — 72.2% SWE-bench Verified

63 GB · 256K ctx ·

DeepSeek R1

DeepSeek · 671B

Massive MoE reasoning model — 37B active

343.7 GB · 64K ctx ·

DeepSeek V3.2

3mo ago

DeepSeek · 685B

State-of-the-art MoE — 37B active params

350.9 GB · 128K ctx ·

Kimi K2

Moonshot AI · 1T

1T-param MoE with 384 experts — 32B active, strong agentic coding

512.2 GB · 128K ctx ·

All models

Qwen 3.5 0.8B

Alibaba · 0.8B

Ultra-tiny model for embedded and edge

0.5 GB · 32K ctx ·

Llama 3.2 1B

Meta · 1B

Meta's smallest Llama for edge devices

0.5 GB · 128K ctx ·

Gemma 3 1B

Google · 1B

Google's tiny Gemma for on-device

0.5 GB · 32K ctx ·

TinyLlama 1.1B

2y ago

Community · 1.1B

Ultralight model for constrained devices

0.6 GB · 2K ctx ·

Qwen 2.5 Coder 1.5B

Alibaba · 1.5B

Ultra-lightweight coding model

0.8 GB · 32K ctx ·

DeepSeek R1 1.5B

DeepSeek · 1.5B

Tiny reasoning model distilled from R1

0.8 GB · 64K ctx ·

Qwen 3 1.7B

Alibaba · 1.7B

Compact multilingual Qwen 3

0.9 GB · 32K ctx ·

Qwen 3.5 2B

Alibaba · 2B

Small multimodal Qwen 3.5

1 GB · 32K ctx ·

Gemma 2 2B

Google · 2B

Google's compact open model

1 GB · 8K ctx ·

Llama 3.2 3B

Meta · 3B

Lightweight Llama for mobile and edge

1.5 GB · 128K ctx ·

SmolLM3 3B

HuggingFace · 3B

Lightweight multilingual reasoning

1.5 GB · 128K ctx ·

Phi-3.5 Mini

Microsoft · 3.8B

Microsoft's efficient small model with long context

1.9 GB · 128K ctx ·

Phi-4 Mini Reasoning

Microsoft · 3.8B

Lightweight reasoning model

1.9 GB · 16K ctx ·

Qwen 3 4B

Alibaba · 4B

Compact Qwen 3 for general tasks

2 GB · 32K ctx ·

Gemma 3 4B

Google · 4B

Multimodal Gemma with 128K context

2 GB · 128K ctx ·

Qwen 3.5 4B

Alibaba · 4B

Small multimodal Qwen 3.5

2 GB · 32K ctx ·

Mistral 7B v0.3

Mistral AI · 7B

High-quality 7B with sliding window attention

3.6 GB · 32K ctx ·

Qwen 2.5 7B

Alibaba · 7B

Strong multilingual and coding capabilities

3.6 GB · 128K ctx ·

Qwen 2.5 Coder 7B

Alibaba · 7B

Dedicated coding model

3.6 GB · 128K ctx ·

DeepSeek R1 Distill 7B

DeepSeek · 7B

R1 reasoning distilled into Qwen 7B

3.6 GB · 64K ctx ·

Qwen 3 8B

Alibaba · 8B

Qwen 3 with thinking mode support

4.1 GB · 128K ctx ·

Ministral 8B

Mistral AI · 8B

Mistral's efficient 8B model

4.1 GB · 32K ctx ·

Gemma 2 9B

Google · 9B

Google's best mid-size open model

4.6 GB · 8K ctx ·

GLM-4 9B

Zhipu AI · 9B

Multilingual model supporting 26 languages with 128K context

4.6 GB · 128K ctx ·

Nemotron Nano 9B v2

9mo ago

NVIDIA · 9B

Hybrid Mamba2 architecture for reasoning

4.6 GB · 128K ctx ·

Llama 3.2 11B Vision

Meta · 11B

Multimodal vision and text model

5.6 GB · 128K ctx ·

Gemma 3 12B

Google · 12B

Multimodal Gemma with 128K context

6.1 GB · 128K ctx ·

Mistral Nemo 12B

Mistral AI · 12B

Multilingual 12B with 128K context

6.1 GB · 128K ctx ·

Qwen 2.5 14B

Alibaba · 14B

Excellent quality for its size class

7.2 GB · 128K ctx ·

Qwen 3 14B

Alibaba · 14B

Strong all-rounder with thinking mode

7.2 GB · 128K ctx ·

DeepSeek R1 Distill 14B

DeepSeek · 14B

R1 reasoning distilled into Qwen 14B

7.2 GB · 64K ctx ·

LFM2 24B

4mo ago

Liquid AI · 24B

Hybrid MoE with convolution+attention layers — 2.3B active

12.3 GB · 32K ctx ·

Devstral Small 2 24B

3mo ago

Mistral AI · 24B

Coding-focused model with 256K context — 68% SWE-bench

12.3 GB · 256K ctx ·

Gemma 2 27B

Google · 27B

Google's largest Gemma 2 model

13.8 GB · 8K ctx ·

Qwen 3.5 27B

Alibaba · 27.8B

Flagship native multimodal Qwen 3.5

14.2 GB · 256K ctx ·

Qwen 3 30B-A3B

Alibaba · 30B

MoE with only 3.3B active — extremely efficient

15.4 GB · 128K ctx ·

Nemotron 3 Nano 30B

9mo ago

NVIDIA · 30B

MoE with 1M context and 3B active

15.4 GB · 1024K ctx ·

Qwen 2.5 32B

Alibaba · 32B

High-quality reasoning and multilingual

EXAONE 4.0 32B

LG AI · 32B

Hybrid reasoning, multilingual

OLMo 2 32B

Allen AI · 32B

Fully open research model by Allen AI

16.4 GB · 4K ctx ·

Command R 35B

2y ago

Cohere · 35B

Optimized for retrieval-augmented generation

17.9 GB · 128K ctx ·

Qwen 3.5 35B-A3B

Alibaba · 35B

Efficient multimodal MoE with 3B active

17.9 GB · 256K ctx ·

Mixtral 8x7B

2y ago

Mistral AI · 47B

MoE with 12.9B active params

24.1 GB · 32K ctx ·

Qwen 2.5 72B

Alibaba · 72B

Alibaba's flagship open model

36.9 GB · 128K ctx ·

Qwen 3.5 122B-A10B

Alibaba · 122B

Large multimodal MoE with 10B active

62.5 GB · 256K ctx ·

Mixtral 8x22B

Mistral AI · 141B

Large MoE with 39B active params

72.2 GB · 64K ctx ·

Qwen 3 235B-A22B

Alibaba · 235B

Massive MoE with 22B active — frontier quality

120.4 GB · 128K ctx ·

Qwen 3.5 397B-A17B

Alibaba · 397B

Largest multimodal Qwen 3.5 MoE

203.4 GB · 256K ctx ·

Llama 4 Maverick 17B-128E

Meta · 400B

Multimodal MoE with 128 experts — 17B active, 1M context

204.9 GB · 1024K ctx ·

Llama 3.1 405B

Meta · 405B

Largest open-weight dense model by Meta

207.5 GB · 128K ctx ·

Qwen 3 Coder 480B