![]() |
|
![]() |
|
How much do those input tokens cost? According to https://ai.google.dev/pricing it's $0.70/million input tokens (for a long context). That will be per-exchange, so every little back and forth will cost around that much (if you're using a substantial portion of the context window). And while I haven't tested Gemini, most LLMs get increasingly wonky as the context goes up, more likely to fixate, more likely to forget instructions. That big context window could definitely be great for certain tasks (especially information extraction), but it doesn't feel like a generally useful feature. |
![]() |
|
Is there a way to amortize that cost over several queries, i.e. "pre-bake" a document into a context persisted in some form to allow cheaper follow-up queries about it?
|
![]() |
|
Can anyone speculate on how G arrived at this price, and perhaps how it contrasts with how OAI arrived at its updated pricing? (realizing it can't be held up directly to GPT x at the moment)
|
![]() |
|
Isn't there retrieval degradation with such a large context size? I would still think that a RAG system on 128K is still better than No Rag + 1M context window, no? (assuming text only)
|
![]() |
|
You don't really use it, right? There's no way to debug if you're doing it like this. Also, the accuracy isn't high, and it can't answer complicated questions, making it quite useless for the cost.
|
![]() |
|
> signal from the community that you went overboard. What signal are you referring to? That you and one other person are making low brow comments that do not enhance the conversation? |
![]() |
|
A lightweight model that you can only use in the cloud? That is amusing. These tech megacorps are really intent on owning your usage of AI. But we must not let that be the future.
|
![]() |
|
Presumably we are defining smartness as doing more with less, so this indicates they have something going on in the latent space which will scale.
|
![]() |
|
It's ironic that when you ask these AI chatbots what their own context size is, they don't know. ChatGPT doesn't even know about 4o existing in 4o.
|
![]() |
|
I've been diligently trying to use Gemini 1.5 Pro, and it is not even on the level of Llama3-70B. I really hope Gemini improves, even if it gets reduced context length.
|
![]() |
|
Uh guys, yeah.. Adobe are on the phone saying something about trademark infringement, apparently Flash is something else? I don't know, I've never heard of it..
|
![]() |
|
Last I checked you could disable the safety triggers as an API user with gemini (which doesn't alleviate your obligation to follow the TOS as to the uses of the model).
|
![]() |
|
I'm not working with a company that can just write in the ToS "we can do anything we want. lol. lmao" and expect me to follow it religiously. Corporations need less control over speech, not more.
|
![]() |
|
It says it may fall back to a worse model under load and there is no way to tell which you are getting. I think chatgpt has at times done something similar though.
|
Not bad compared to rolling your own, but among frontier models the main competitive differentiator was native multimodality. With the release of GPT-4o I'm not clear on why an organization not bound to GCP would pick Gemini. 128k context (4o) is fine unless you're processing whole books/movies at once. Is anyone doing this at scale in a way that can't be filtered down from 1M to 100k?