(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=43464068

Hacker News 的讨论集中在两个新发布的开源中文模型:Qwen2.5-VL-32B 和 DeepSeek-v3-0324。Simonw 指出 32B 模型的能力,将其与 2023 年初 GPT-4 的性能进行比较,并强调其适合在单 GPU 或配置良好的笔记本电脑上运行。 讨论还涉及运行这些模型的实用性,用户讨论了量化以降低 VRAM 需求,以及在 4090 等 GPU 上运行不同模型大小的经验。有人担心可能会利用提示来训练 DeepSeek 的免费版本,并提及了 OpenRouter 和 Deep Infra 等替代服务。讨论还涉及多模态能力对文本性能的影响以及操纵未来模型迭代的“定向民调”(push polling)的可能性。最后,用户推荐了 open-webui,并讨论了适合在家用电脑上进行 RAG 任务的模型。


原文
Hacker News new | past | comments | ask | show | jobs | submit login
Qwen2.5-VL-32B: Smarter and Lighter (qwenlm.github.io)
66 points by tosh 54 minutes ago | hide | past | favorite | 19 comments










Big day for open source Chinese model releases - DeepSeek-v3-0324 came out today too, an updated version of DeepSeek v3 now under an MIT license (previously it was a custom DeepSeek license). https://simonwillison.net/2025/Mar/24/deepseek/


it seems that this free version "may use your prompts and completions to train new models"

https://openrouter.ai/deepseek/deepseek-chat-v3-0324:free

do you think this needs attention?



That's typical of the free options on OpenRouter, if you don't want your inputs used for training you use the paid one: https://openrouter.ai/deepseek/deepseek-chat-v3-0324


Since we are on HN here, I can highly recommend open-webui with some OpenAI-compatible provider. I'm running with Deep Infra for more than a year now and am very happy. New models are usually available within one or two days after release. Also have some friends who use the service almost daily.


Is there any reporting on whether the equivalent of "push polling" is widely used in an attempt to manipulate future iterations of models?


Pretty soon I won't be using any American models. It'll be a 100% Chinese open source stack.

The foundation model companies are screwed. Only shovel makers (Nvidia, infra companies) and product companies are going to win.



32B is one of my favourite model sizes at this point - large enough to be extremely capable (generally equivalent to GPT-4 March 2023 level performance, which is when LLMs first got really useful) but small enough you can run them on a single GPU or a reasonably well specced Mac laptop (32GB or more).


Or quantized on a 4090!


32B don't fully fit 16GB of VRAM. Still fine for higher quality answers, worth the extra wait in some cases.


I don't think these models are GPT-4 level. Yes they seem to be on benchmarks, but it has been known that models increasingly use A/B testing in dataset curation and synthesis(using GPT 4 level models) to optimize not just the benchmarks but things which could be benchmarked like academics.


I'm not talking about GPT-4o here - every benchmark I've seen has had the new models from the past ~12 months out-perform the March 2023 GPT-4 model.

To pick just the most popular one, https://lmarena.ai/?leaderboard= has GPT-4-0314 ranked 83rd now.



Does anyone know how making the models multimodal impacts their text capabilities? The article is claiming this achieves good performance on pure text as well, but I'm curious if there is any analysis on how much impact it usually has.

I've seen some people claim it should make the models better at text, but I find that a little difficult to believe without data.



So today is Qwen. Tomorrow a new SOTA model from Google apparently, R2 next week.

We haven't hit the wall yet.



To clarify: Qwen is made by Alibaba Cloud.

(It's not mentioned anywhere in the blog post.)



Wish I knew better how to estimate what sized video card one needs. HuggingFace link says this is bfloat16, so at least 64GB?

I guess the -7B might run on my 16GB AMD card?



I wish they would start producing graphs with quantized version performances as well. What matters is RAM/bandwidth vs performance, not number of parameters.


You can run 4-bit quantized version at a small (though nonzero) cost to output quality, so you would only need 16GB for that.

Also it's entirely possible to run a model that doesn't fit in available GPU memory, it will just be slower.



deepseek-r1:14b/mistral-small:24b/qwen2.5-coder:14b fit 16GB VRAM with fast generation. 32b versions bleed into RAM and take a serious performance hit but still usable.


What is the recommended model to process a RAG of PDF text documents? I've seen some recommendations for Mistral:7b. Looking to run on a consumer pedestrian home PC (ollama) with a Nvidia 4060ti and Ryzen 5700x.






Join us for AI Startup School this June 16-17 in San Francisco!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com