Qwen2.5-VL-32B: Smarter and Lighter

simonw · 2025-03-24T18:52:15 1742842335

Big day for open source Chinese model releases - DeepSeek-v3-0324 came out today too, an updated version of DeepSeek v3 now under an MIT license (previously it was a custom DeepSeek license). https://simonwillison.net/2025/Mar/24/deepseek/

chaosprint · 2025-03-24T19:06:45 1742843205

it seems that this free version "may use your prompts and completions to train new models"

https://openrouter.ai/deepseek/deepseek-chat-v3-0324:free

do you think this needs attention?

wgd · 2025-03-24T19:08:28 1742843308

That's typical of the free options on OpenRouter, if you don't want your inputs used for training you use the paid one: https://openrouter.ai/deepseek/deepseek-chat-v3-0324

huijzer · 2025-03-24T19:21:48 1742844108

Since we are on HN here, I can highly recommend open-webui with some OpenAI-compatible provider. I'm running with Deep Infra for more than a year now and am very happy. New models are usually available within one or two days after release. Also have some friends who use the service almost daily.

rz2k · 2025-03-24T19:19:12 1742843952

Is there any reporting on whether the equivalent of "push polling" is widely used in an attempt to manipulate future iterations of models?

echelon · 2025-03-24T19:20:33 1742844033

Pretty soon I won't be using any American models. It'll be a 100% Chinese open source stack.

The foundation model companies are screwed. Only shovel makers (Nvidia, infra companies) and product companies are going to win.

simonw · 2025-03-24T18:53:48 1742842428

32B is one of my favourite model sizes at this point - large enough to be extremely capable (generally equivalent to GPT-4 March 2023 level performance, which is when LLMs first got really useful) but small enough you can run them on a single GPU or a reasonably well specced Mac laptop (32GB or more).

redrove · 2025-03-24T19:07:11 1742843231

Or quantized on a 4090!

clear_view · 2025-03-24T18:58:33 1742842713

32B don't fully fit 16GB of VRAM. Still fine for higher quality answers, worth the extra wait in some cases.

YetAnotherNick · 2025-03-24T19:14:02 1742843642

I don't think these models are GPT-4 level. Yes they seem to be on benchmarks, but it has been known that models increasingly use A/B testing in dataset curation and synthesis(using GPT 4 level models) to optimize not just the benchmarks but things which could be benchmarked like academics.

simonw · 2025-03-24T19:24:27 1742844267

I'm not talking about GPT-4o here - every benchmark I've seen has had the new models from the past ~12 months out-perform the March 2023 GPT-4 model.

To pick just the most popular one, https://lmarena.ai/?leaderboard= has GPT-4-0314 ranked 83rd now.

Arcuru · 2025-03-24T19:17:19 1742843839

Does anyone know how making the models multimodal impacts their text capabilities? The article is claiming this achieves good performance on pure text as well, but I'm curious if there is any analysis on how much impact it usually has.

I've seen some people claim it should make the models better at text, but I find that a little difficult to believe without data.

gatienboquet · 2025-03-24T19:08:17 1742843297

So today is Qwen. Tomorrow a new SOTA model from Google apparently, R2 next week.

We haven't hit the wall yet.

lysace · 2025-03-24T19:18:16 1742843896

To clarify: Qwen is made by Alibaba Cloud.

(It's not mentioned anywhere in the blog post.)

jauntywundrkind · 2025-03-24T18:47:18 1742842038

Wish I knew better how to estimate what sized video card one needs. HuggingFace link says this is bfloat16, so at least 64GB?

I guess the -7B might run on my 16GB AMD card?

xiphias2 · 2025-03-24T18:53:32 1742842412

I wish they would start producing graphs with quantized version performances as well. What matters is RAM/bandwidth vs performance, not number of parameters.

wgd · 2025-03-24T18:50:13 1742842213

You can run 4-bit quantized version at a small (though nonzero) cost to output quality, so you would only need 16GB for that.

Also it's entirely possible to run a model that doesn't fit in available GPU memory, it will just be slower.

clear_view · 2025-03-24T19:00:30 1742842830

deepseek-r1:14b/mistral-small:24b/qwen2.5-coder:14b fit 16GB VRAM with fast generation. 32b versions bleed into RAM and take a serious performance hit but still usable.

nodesocket · 2025-03-24T19:22:02 1742844122

What is the recommended model to process a RAG of PDF text documents? I've seen some recommendations for Mistral:7b. Looking to run on a consumer pedestrian home PC (ollama) with a Nvidia 4060ti and Ryzen 5700x.

（评论） (comments)

（评论）
(comments)