![]() |
|
![]() |
| Easy head math: parameter count times parameter size plus 20-40% for inference slop space. Anywhere from 8-40GB of vram required depending on quantization levels being used. |
![]() |
| They did quantization aware training for fp8 so you won't get any benefits from using more than 12GB of RAM for the parameters. What you might be using more RAM is the much bigger context window. |
![]() |
| > Mistral NeMo uses a new tokenizer, Tekken, based on Tiktoken, that was trained on over more than 100 languages, and compresses natural language text and source code more efficiently than the SentencePiece tokenizer used in previous Mistral models.
Does anyone have a good answer why everyone went back to SentencePiece in the first place? Byte-pair encoding (which is what tiktoken uses: https://github.com/openai/tiktoken) was shown to be a more efficient encoding as far back as GPT-2 in 2019. |
![]() |
| Nvidia has a blogpost about Mistral Nemo, too. https://blogs.nvidia.com/blog/mistral-nvidia-ai-model/
> Mistral NeMo comes packaged as an NVIDIA NIM inference microservice, offering performance-optimized inference with NVIDIA TensorRT-LLM engines. > *Designed to fit on the memory of a single NVIDIA L40S, NVIDIA GeForce RTX 4090 or NVIDIA RTX 4500 GPU*, the Mistral NeMo NIM offers high efficiency, low compute cost, and enhanced security and privacy. > The model was trained using Megatron-LM, part of NVIDIA NeMo, with 3,072 H100 80GB Tensor Core GPUs on DGX Cloud, composed of NVIDIA AI architecture, including accelerated computing, network fabric and software to increase training efficiency. |
![]() |
| I believe that if Mistral is serious about advancing in open source, they should consider sharing the corpus used for training their models, at least the base models pretraining data. |
![]() |
| I don't have explanations but I can point you to one of the papers: https://arxiv.org/pdf/2309.12288 which calls it "the reversal curse" and does a bunch of experiments showing models that are successful at questions like "Who is Tom Cruise’s mother?" (Mary Lee Pfeiffer) will not be equally successful at answering "Who is Mary Lee Pfeiffer’s son?"
|
![]() |
| Isn't that specific case just a matter of not having enough data _explicitly_ stating the reverse? Seems as if they are indeed stochastic parrots from that perspective. |
![]() |
| Anecdata, but I did some continued pretraining on a toy LLM using machine-translated data; of the original dataset.
Performance improved across all benchmarks; in English (the original language). |
![]() |
| You will need enough VRAM, 1080ti is not going to work very well, maybe get a 3090 with 24GB VRAM.
I think it should also run well on a 36GB MacBook Pro or probably a 24GB Macbook Air |
![]() |
| It's just easier to iterate and improve on a coding specialist AI when that is also the skill required to iterate on said AI.
Products that build on general LLM tech are already being used in other fields. For example, my lawyer friend has started using one by LexisNexis[0] and is duly impressed by how it works. It's only a matter of time before models like that get increasingly specialized for that kind of work, it's just harder for lawyers to drive that kind of change alone. Plus, there's a lot more resistance in 'legacy' professions to any kind of change, much less one that is perceived to threaten the livelihoods of established professionals. Current LLMs are already not bad at a lot of things, but lawyer bots, accountant bots and more are likely coming. [0] https://www.lexisnexis.com/en-us/products/lexis-plus-ai.page |
![]() |
| Those are regulated industries, where as software development is not.
An AI spitting back bad code won't compile. An AI spitting back bad financial/legal advice bankrupts people. |
![]() |
| Generating code has significant economical benefit. The code once generated can be execute so many times without requiring high computing resources, unlike AI model. |
![]() |
| Then again, we just had this on the front page: https://news.ycombinator.com/item?id=40957990
> We first document a significant decline in stock trading volume during ChatGPT outages and find that the effect is stronger for firms with corporate news released immediately before or during the outages. We further document similar declines in the short-run price impact, return variance, and bid-ask spreads, consistent with a reduction in informed trading during the outage periods. Lastly, we use trading volume changes during outages to construct a firm-level measure of the intensity of GAI-assisted trading and provide early evidence of a positive effect of GAI-assisted trading on long-run stock price informativeness. They're being used, but nobody is really saying anything because the stock market is a zero sum game these days and letting anyone else know that this holds water is a recipe for competition. Programming is about the opposite, the more you give, the more you get, so it makes sense to popularize it as a feature. |
![]() |
| Congrats. Very exciting to see continued innovation around smaller models, that can perform much better than larger models. This enables faster inference and makes them more ubiquitous. |
![]() |
| Worth noting this model has 50% more parameters than llama3. There are performance gains but some of the gains might be from using more compute rather than performance per unit compute. |
![]() |
| What's the reason for measuring the model size in context window length and not GB?
Also, are these small models OSS? Easier self hosting seems to be the main benefo for small models. |
> We have released pre-trained base and instruction-tuned checkpoints checkpoints under the Apache 2.0 license to promote adoption for researchers and enterprises. Mistral NeMo was trained with quantisation awareness, enabling FP8 inference without any performance loss.
So that's... uniformly an improvement at just about everything, right? Large context, permissive license, should have good perf. The one thing I can't tell is how big 12B is going to be (read: how much VRAM/RAM is this thing going to need). Annoyingly and rather confusingly for a model under Apache 2.0, https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407 refuses to show me files unless I login and "You need to agree to share your contact information to access this model"... though if it's actually as good as it looks, I give it hours before it's reposted without that restriction, which Apache 2.0 allows.