![]() |
|
![]() |
| What's the most obvious standouts?
In my experience, smaller models tend to do well on benchmarks and fail at generalization. Phi-2 comes to mind. |
![]() |
| I think this is just due to better non-English training data.
It's 15 ELO under Llama-3-70B on english hard prompts and 41 ELO under Llama-3-70B (the latter is actually stat sig) for general English. |
![]() |
| Hello (again) from the Gemma team! We are quite excited to push this release out and happy to answer any questions!
Opinions are our own and not of Google DeepMind. |
![]() |
| I create new accounts because I use hn too much.
I use gcp professional every day and always found it quite intuitive. Did plenty of image classification with vertex ai too |
![]() |
| I also work at Google and on Gemma (so same disclaimers)
You can try 27b at www.aistudio,google.com. Send in your favorite prompts, and we hope you like the responses. |
![]() |
| Why is AIStudio not available in Ukraine? I have no problem with using Gemini web UI or other LLM providers from Ukraine, but this Google API constrain is strange. |
![]() |
| Thanks for your work on this; excited to try it out!
The Google API models support 1M+ tokens, but these are just 8K. Is there a fundamental architecture difference, training set, something else? |
![]() |
| To quote Ludovic Peran, our amazing safety lead:
Literature has identified self-proliferation as dangerous capability of models, and details about how to define it and example of form it can take have been openly discussed by GDM (https://arxiv.org/pdf/2403.13793). Current Gemma 2 models' success rate to end-to-end challenges is null (0 out 10), so the capabilities to perform such tasks are currently limited. |
![]() |
| There was no parameter creep with Llama. Llama 8B is actually a ~7B model comparable to Mistral 7B if you strip away multilingual embeddings and match what Mistral 7B supports. |
![]() |
| You're confusing it with data poisoning.
Model collapse itself is(was?) a fairly serious research topic: https://arxiv.org/abs/2305.17493 We've by now reached a "probably not inevitable" - https://arxiv.org/abs/2404.01413 argues there's a finite upper bound to error - but I'd also point out that that paper assumes training data cardinality increases with the number of training generations and is strictly accumulative. To a first order, that means you better have a pre-2022 dataset to get started, and have archived it well. but it's probably fair to say current SOTA is still more or less "it's neither impossible nor inevitable". |
![]() |
| Just downloaded, looks great. Love the synced split view.
But I'm not seeing Gemma 2 or Claude 3.5 Sonnet even though it's announced on your landing page. |
![]() |
| Another take on this: phi-3 small has 1100 ELO on LMSYS (ranked #52) while the confidence interval for Gemma 2 9B is [1170, 1200] ELO (ranked btw #15 and #25). |
![]() |
| Worse in some aspects, better in other.
Small models are never going to be generalists, so having several small models allows you to pick the one that best fits your needs. |
![]() |
| Nice! Can you explain what you mean by "simulate training beyond the number of available tokens"?
Why does using distillation from a larger model simulate training with more tokens? |
![]() |
| Do we know if Gemma models are fundamentally different from the ones hosted as Gemini? Gemini 1.5 flash seems to produce good results for the price and performance. |
![]() |
| Are these small Gemma 2 distilled models available anywhere? I'm not finding them on huggingface.co, etc. but maybe I don't know the exact model names they are published.
Are the weights released yet? |
![]() |
| that's actually the particular one I was looking for and couldn't find. Also had googled for the other ones but maybe it was so recent that it hadn't been indexed. Thanks! |
![]() |
| I suppose it would act as a concrete separator when instruct tuning, but lots of prompt templates don't use it, especially older ones like Alpaca. Maybe it leads to more overall coherence? |
![]() |
| Not instruct tuning, you use it in general training.
If you have a bunch of small prompts/answers, you can fit them into bigger batches if you use start/stop tokens. |
![]() |
| Another take on this: phi-3 small has 1100 ELO on LMSYS (ranked #52) while the confidence interval for Gemma 2 9B is [1170, 1200] ELO (ranked btw #15 and #25). |
It's exceptionally strong. In LMSys Chatbot Arena, the 27B version scores above LLama-3-70B, at the level of OpenAI GPT-4 and Claude-3 Sonnet!