![]() |
|
![]() |
| DeepSeek sounds really good, but the terms/privacy policy look a bit sketch (e.g. grant full license to use/reproduce inputs and outputs). Is there anywhere feasible to spin up the 240B model for a similarly cheap price in private?
The following quotes from a reddit comment here https://www.reddit.com/r/LocalLLaMA/comments/1dkgjqg/comment... > under International Data Transfers (in the Privacy Policy): """ The personal information we collect from you may be stored on a server located outside of the country where you live. We store the information we collect in secure servers located in the People's Republic of China . """ > under How We Share Your Information > Our Corporate Group (in the Privacy Policy): """ The Services are supported by certain entities within our corporate group. These entities process Information You Provide, and Automatically Collected Information for us, as necessary to provide certain functions, such as storage, content delivery, security, research and development, analytics, customer and technical support, and content moderation. """ > under How We Use Your Information (in the Privacy Policy): """ Carry out data analysis, research and investigations, and test the Services to ensure its stability and security; """ > under 4.Intellectual Property (in the Terms): """ 4.3 By using our Services, you hereby grant us an unconditional, irrevocable, non-exclusive, royalty-free, sublicensable, transferable, perpetual and worldwide licence, to the extent permitted by local law, to reproduce, use, modify your Inputs and Outputs in connection with the provision of the Services. """ |
![]() |
| It's a 236B MoE model with only 21B active parameters that ollama is reporting having 258k downloads [1] (for 16/236 combined) whilst Hugging Face says was downloaded 37k times last month [2], which can run at 25 tok/s on a single M2 Ultra [3].
At $0.14M/$0.28M it's a no brainier to use their APIs. I understand some people would have privacy concerns and would want to avoid their APIs, although I personally spend all my time contributing to publicly available OSS code bases so I'm happy for any OSS LLM to use any of our code bases to improve their LLM and hopefully also improving the generated code for anyone using our libraries. Since many LLM orgs are looking to build proprietary moats around their LLMs to maintain their artificially high prices, I'll personally make an effort to use the best OSS LLMs available first (i.e. from DeepSeek, Meta, Qwen or Mistral AI) since they're bringing down the cost of LLMs and aiming to render the technology a commodity. [1] https://ollama.com/library/deepseek-coder-v2 [2] https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-In... |
![]() |
| When will we have token-flow-aware-networking gear...
Surely NVIDIA and others are already doing special traffic shaping for tokenFlows?
Whats the current state of such tech/thought/standards/vendors? |
![]() |
| There’s no company info on DeepSeek’s website. Looking at the above, and considering that, it seems very sketchy indeed.
Maybe OK for trying out stuff, a big no no for real work. |
![]() |
| The names of their researchers are on this recent paper: https://arxiv.org/pdf/2408.15664
Their terms of service say "The DeepSeek Open Platform is jointly owned and operated by Hangzhou DeepSeek Artificial Intelligence Co., Ltd., Beijing DeepSeek Artificial Intelligence Co., Ltd. " And they're funded by https://www.high-flyer.cn/en/fund/ which the FT did an article on: https://www.ft.com/content/357f3c68-b866-4c2e-b678-0d075051a... In terms of the personal data you share when using their models, I can't see why they would be any more or less nefarious than big Western tech companies. That said, if you're using a model based in China then by providing them with data and feedback you are in a very small way helping researchers in China catch up with/keep up with/overtake researchers in the West. Maybe in the long term that could end badly. And if you are concerned about the environment, it's entirely possible their training and inference is run using coal power stations. |
![]() |
| > There’s no company info on DeepSeek’s website.
It's backed solely by a hedge fund who do not want to draw attention to their business. So yeah, as sketchy as DESRES. |
![]() |
| I'm making a small calendar renderer for e-ink screens (https://github.com/skorokithakis/calumny) which Claude basically wrote all of, so I figured I'd try DeepSeek. I had it add a small circle to the left of the "current day" line, which it added fine, but it couldn't solve the problem of the circle not being shown over another element. It tried and tried, to no avail, until I switched to Claude, which fixed the problem immediately.
43x cheaper is good, but my time is also worth money, and it unfortunately doesn't bode well for me that it's stumped by the first problem I throw at it. |
![]() |
| The models benefit immensely from being trained with more data from other languages, even if you only ever use it in one.
You could finetune it on your codebases and specific docs for added perf. |
![]() |
| Isn't microsoft phi specifically trained for Python? I recall that Phi 1 was advertised as a Python coding helper.
It's a small model trained only by quality sources (ie textbooks). |
![]() |
| I understand your situation. It sounds super simple to me now but I remember having to spend at least a week trying to get the concepts and figuring out what prerequisite knowledge I would need between a continium of just using chatgpt and learning relevant vector math etc. It is much closer to the chatgpt side fortunately. I don't like ollama per se (because i can't reuse its models with other frontends due to it compressing them in its own format) but it's still a very good place to start. Any interface that lets you download models as gguf from huggingface will do just fine. Don't be turned off by the roleplaying/waifu sounding frontend names. They are all fine. This is what I mostly prefer: https://github.com/oobabooga/text-generation-webui
|
![]() |
| Google's Gemini does.
I can't find a post that I remember Google published just after all the ChatGPT SQL generation hype happened, but it felt like they were trying to counter that hype by explaining that most complex LLM-generated code snippets won't actually run or work, and that they were putting a code-evaluation step after the LLM for Bard. (A bit like why did they never put an old fashioned rules-based grammar checker check stage in google translate results?) Fast forward to today and it seems it's a normal step for Gemini etc https://ai.google.dev/gemini-api/docs/code-execution?lang=py... |
![]() |
| Everytime someone tells how AI 10x his programming capabilities I'm like "tell me you're bad at coding without telling me". |
![]() |
| It allows me to move much faster, because I can write a comment describing something more high-level and get plausible code from it to review & correct. |
[1] https://aider.chat/docs/leaderboards/