![]() |
|
![]() |
|
Seems like functions could work well to give it an active and distinct choice, but I'm still unsure if the function/parameters are going to be the logical, correct answer...
|
![]() |
|
There's nothing to switch to. You just enable it. No need to change the prompt or anything else. All it requires is that you mention "JSON" in your prompt, which you obviously already do.
|
![]() |
|
How much of this is just that one model responds better to the way you write prompts? Much like you working with Bob and opining that Bob is great, and me saying that I find Jack easier to work with. |
![]() |
|
Have you tried response_format=json_object? I had better luck with function-calling to get a structured response, but it is more limiting than just getting a JSON body. |
![]() |
|
Hi! My work is similar and I'd love to have someone to bounce ideas off of if you don't mind. Your profile doesn't have contact info though. Mine does, please send me a message. :) |
![]() |
|
There are a few improvements I'd suggest with that prompt if you want to maximise its performance. 1. You're really asking for hallucinations here. Asking for factual data is very unreliable, and not what these models are strong at. I'm curious how close/far the results are from ground truth. I would definitely bet that outside of the top 5, numbers would be wobbly and outside of top... 25?, even the ranking would be difficult to trust. Why not just get this from a more trustworthy source?[0] 2. Asking in French might, in my experience, give you results that are not as solid as asking in English. Unless you're asking for a creative task where the model might get confused with EN instructions requiring an FR result, it might be better to ask in EN. And you'll save tokens. 3. Providing the model with a rough example of your output JSON seems to perform better than describing the JSON in plan language. [0]: https://fr.wikipedia.org/wiki/Liste_des_communes_de_France_l... |
![]() |
|
It can be faster and more effective to fallback to a smaller model (gpt3.5 or haiku), the weakness of the prompt will be more obvious on a smaller model and your iteration time will be faster
|
![]() |
|
Do different versions react to prompts in the same way? I imagined the prompt would be tailored to the quirks of a particular version rather than naturally being stably optimal across versions.
|
![]() |
|
Hey, OP here! The answer is a bit boring: the expenditure definitely has helped customers - in that, they're using AI generated responses in all their work flows all the time in the app, and barely notice it. See what I did there? :) I'm mostly serious though - one weird thing about our app is that you might not even know we're using AI, unless we literally tell you in the app. And I think that's where we're at with AI and LLMs these days, at least for our use case. You might find this other post I just put up to have more details too, related to how/where I see the primary value: https://kenkantzer.com/gpt-is-the-heroku-of-ai/ |
![]() |
|
Can you provide some more detail about the application? I'm not familiar with how llms are used in business, except as customer support bots returning documentation.
|
![]() |
|
That's for chatting and interfacing conversationally with a human. Using the API is a completely different ballgame because it's not meant to be a back and forth conversation with a human.
|
![]() |
|
How are you sending tabular data in a reliable way. And what is the source document type? I'm trying to solve this for complex financial-related tables in PDFs right now.
|
![]() |
|
> We always extract json. We don’t need JSON mode, Why? The null stuff would not be a problem if you did and if you're only dealing with JSON anyway I don't see why you wouldn't. |
![]() |
|
I feel like for just extracting data into JSON, smaller LLMs could probably do fine, especially with constrained generation and training on extraction.
|
![]() |
|
> Are we going to achieve Gen AI? > No. Not with this transformers + the data of the internet + $XB infrastructure approach. Errr ...did they really mean Gen AI .. or AGI? |
![]() |
|
No. But it might help, because you'll probably have to roll some kind of recursive summarization - I think LangChain has mechanisms a for that which could save you some time.
|
![]() |
|
https://hachyderm.io/@inthehands/112006855076082650 > You might be surprised to learn that I actually think LLMs have the potential to be not only fun but genuinely useful. “Show me some bullshit that would be typical in this context” can be a genuinely helpful question to have answered, in code and in natural language — for brainstorming, for seeing common conventions in an unfamiliar context, for having something crappy to react to. == End of toot. The price you pay for this bullshit in energy when the sea temperature is literally off the charts and we do not know why makes it not worth it in my opinion. |
Here are my take aways
1. There are way too many premature abstractions. Langchain, as one of may examples, might be useful in the future but at the end of the day prompts are just a API call and it's easier to write standard code that treats LLM calls as a flaky API call rather than as a special thing.
2. Hallucinations are definitely a big problem. Summarizing is pretty rock solid in my testing, but reasoning is really hard. Action models, where you ask the llm to take in a user input and try to get the llm to decide what to do next, is just really hard, specifically it's hard to get the llm to understand the context and get it to say when it's not sure.
That said, it's still a gamechanger that I can do it at all.
3. I am a bit more hyped than the author that this is a game changer, but like them, I don't think it's going to be the end of the world. There are some jobs that are going to be heavily impacted and I think we are going to have a rough few years of bots astroturfing platforms. But all in all I think it's more of a force multiplier rather than a breakthrough like the internet.
IMHO it's similar to what happened to DevOps in the 2000s, you just don't need a big special team to help you deploy anymore, you hire a few specialists and mostly buy off the shelf solutions. Similarly, certain ML tasks are now easy to implement even for dumb dumb web devs like me.