GPT-4.1 in the API

Tiberium · 2025-04-14T17:21:49 1744651309

Very important note:

>Note that GPT‑4.1 will only be available via the API. In ChatGPT, many of the improvements in instruction following, coding, and intelligence have been gradually incorporated into the latest version

If anyone here doesn't know, OpenAI does offer the ChatGPT model version in the API as chatgpt-4o-latest, but it's bad because they continuously update it so businesses can't reliably rely on it being stable, that's why OpenAI made GPT 4.1.

minimaxir · 2025-04-14T17:27:53 1744651673

OpenAI (and most LLM providers) allow model version pinning for exactly this reason, e.g. in the case of GPT-4o you can specify gpt-4o-2024-05-13, gpt-4o-2024-08-06, or gpt-4o-2024-11-20.

https://platform.openai.com/docs/models/gpt-4o

exizt88 · 2025-04-14T17:27:43 1744651663

> chatgpt-4o-latest, but it's bad because they continuously update it

Version explicitly marked as "latest" being continuously updated it? Crazy.

croemer · 2025-04-14T17:23:02 1744651382

So you're saying that "ChatGPT-4o-latest (2025-03-26)" in LMarena is 4.1?

granzymes · 2025-04-14T17:28:11 1744651691

No, that is saying that some of the improvements that went into 4.1 have also gone into ChatGPT, including chatgpt-4o-latest (2025-03-26).

modeless · 2025-04-14T17:28:12 1744651692

Who has the numbers vs. Claude Sonnet 4.6 and Gemini 2.5 Pro? And is it available in Cursor yet?

codingwagie · 2025-04-14T17:09:05 1744650545

GPT-4.1 probably is a distilled version of GPT-4.5

I dont understand the constant complaining about naming conventions. The number system differentiates the models based on capability, any other method would not do that. After ten models with random names like "gemini", "nebula" you would have no idea which is which. Its a low IQ take. You dont name new versions of software as completely different software

Also, Yesterday, using v0, I replicated a full nextjs UI copying a major saas player. No backend integration, but the design and UX were stunning, and better than I could do if I tried. I have 15 years of backend experience at FAANG. Software will get automated, and it already is, people just havent figured it out yet

Philpax · 2025-04-14T17:16:23 1744650983

> The number system differentiates the models based on capability, any other method would not do that.

Please rank GPT-4, GPT-4 Turbo, GPT-4o, GPT-4.1, GPT-4.5, o1-mini, o1, o1 pro, o3-mini, o3-mini-high, o3, and o4-mini in terms of capability without consulting any documentation.

umanwizard · 2025-04-14T17:21:53 1744651313

Btw, as someone who agrees with your point, what’s the actual answer to this?

zeroxfe · 2025-04-14T17:22:13 1744651333

There's no single ordering -- it really depends on what you're trying to do, how long you're willing to wait, and what kinds of modalities you're interested in.

codingwagie · 2025-04-14T17:16:40 1744651000

Very easy with the naming system?

bobxmax · 2025-04-14T17:23:25 1744651405

Really? Is o3-mini-high better than o1-pro?

latexr · 2025-04-14T17:25:28 1744651528

> You dont name new versions of software as completely different software

macOS releases would like a word with you.

https://en.wikipedia.org/wiki/MacOS#Timeline_of_releases

Technically they still have numbers, but Apple hides them in marketing copy.

https://www.apple.com/macos/macos-sequoia/

Though they still have “macOS” in the name.

SubiculumCode · 2025-04-14T17:27:19 1744651639

Feel free to lay the naming convention rules out for us man.

tomrod · 2025-04-14T17:21:12 1744651272

Just add SemVer with an extra tag:

4.0.5.worsethan4point5

rvz · 2025-04-14T17:12:39 1744650759

> Yesterday, using v0, I replicated a full nextjs UI copying a major saas player. No backend integration, but the design and UX were stunning, and better than I could do if I tried.

Exactly. Those who do frontend or focus on pretty much anything Javascript are, how should I say it? Cooked?

> Software will get automated

The first to go are those that use JavaScript / TypeScript engineers have already been automated out of a job. It is all over for them.

codingwagie · 2025-04-14T17:15:22 1744650922

Yeah its over for them. Complicated business logic and sprawling systems are what are keeping backend safe for now. But the big front end code bases where individual files (like react components) are largely decoupled from the rest of the code base is why front end is completely cooked

whalesalad · 2025-04-14T17:21:50 1744651310

> I don't understand the constant complaining about naming conventions.

Oh man. Unfolding my lawn chair and grabbing a bucket of popcorn for this discussion.

jsheard · 2025-04-14T17:13:07 1744650787

> using v0, I replicated a full nextjs UI copying a major saas player. No backend integration, but the design and UX were stunning

AI is amazing, now all you need to create a stunning UI is for someone else to make it first so your AI can rip it off. Not beating the "plagiarism machine" allegations here.

codingwagie · 2025-04-14T17:16:25 1744650985

Heres a secret: Most of the highest funded VC backed software companies are just copying a competitor with a slight product spin/different pricing model

florakel · 2025-04-14T17:23:16 1744651396

Exactly, they like to call it “bringing new energy to an old industry”.

umanwizard · 2025-04-14T17:22:27 1744651347

Got any examples?

ZeroCool2u · 2025-04-14T17:08:42 1744650522

No benchmark comparisons to other models, especially Gemini 2.5 Pro, is telling.

dmd · 2025-04-14T17:10:58 1744650658

Gemini 2.5 Pro gets 64% on SWE-bench verified. Sonnet 3.7 gets 70%

They are reporting that GPT-4.1 gets 55%.

hmottestad · 2025-04-14T17:16:39 1744650999

Are those with «thinking» or without?

flakiness · 2025-04-14T17:23:33 1744651413

Big focus on coding. It feels like a defensive move against Claude (and more recently, Gemini Pro) which became very popular in that regime. I guess they recently figured out some ways to train the model for these "agentic" coding through RL or something - and the finding is too new to apply 4.5 on time.

runako · 2025-04-14T17:21:20 1744651280

ChatGPT currently recommends I use o3-mini-high ("great at coding and logic") when I start a code conversation with 4o.

I don't understand why the comparison in the announcement talks so much about comparing with 4o's coding abilities to 4.1. Wouldn't the relevant comparison be to o3-mini-high?

4.1 costs a lot more than o3-mini-high, so this seems like a pertinent thing for them to have addressed here. Maybe I am misunderstanding the relationship between the models?

minimaxir · 2025-04-14T17:15:17 1744650917

It's not the point of the announcement, but I do like the use of the (abs) subscript to demonstrate the improvement in LLM performance since in these types of benchmark descriptions I never can tell if the percentage increase is absolute or relative.

porphyra · 2025-04-14T17:04:40 1744650280

pretty wild versioning that GPT 4.1 is newer and better in many regards than GPT 4.5.

mhh__ · 2025-04-14T17:05:18 1744650318

I think they're doing it deliberately at this point

hmottestad · 2025-04-14T17:18:10 1744651090

Tomorrow they are releasing the open source GPT-1.4 model :P

croemer · 2025-04-14T17:17:15 1744651035

Testing against unspecified other "leading" models allows for shenanigangs:

> Qodo tested GPT‑4.1 head-to-head against other leading models [...] they found that GPT‑4.1 produced the better suggestion in 55% of cases

The linked blog post goes 404: https://www.qodo.ai/blog/benchmarked-gpt-4-1/

yberreby · 2025-04-14T17:12:23 1744650743

> Note that GPT‑4.1 will only be available via the API. In ChatGPT, many of the improvements in instruction following, coding, and intelligence have been gradually incorporated into the latest version (opens in a new window) of GPT‑4o, and we will continue to incorporate more with future releases.

The lack of availability in ChatGPT is disappointing, and they're playing on ambiguity here. They are framing this as if it were unnecessary to release 4.1 on ChatGPT, since 4o is apparently great, while simultaneously showing how much better 4.1 is relative to GPT-4o.

One wager is that the inference cost is significantly higher for 4.1 than for 4o, and that they expect most ChatGPT users not to notice a marginal difference in output quality. API users, however, will notice. Alternatively, 4o might have been aggressively tuned to be conversational while 4.1 is more "neutral"? I wonder.

Tiberium · 2025-04-14T17:23:13 1744651393

There's a HUGE difference that you are not mentioning: there are "gpt-4o" and "chatgpt-4o-latest" on the API. The former is the stable version (there are a few snapshot but the newest snapshot has been there for a while), and the latter is the fine-tuned version that they often update on ChatGPT. All those benchmarks were done for the API stable version of GPT-4o, since that's what businesses rely on, not on "chatgpt-4o-latest".

themanmaran · 2025-04-14T17:21:53 1744651313

I disagree. From the average user perspective, it's quite confusing to see half a dozen models to choose from in the UI. In an ideal world, ChatGPT would just abstract away the decision. So I don't need to be an expert in the relatively minor differences between each model to have a good experience.

Vs in the API, I want to have very strict versioning of the models I'm using. And so letting me run by own evals and pick the model that works best.

exizt88 · 2025-04-14T17:05:12 1744650312

For conversational AI, the most significant part is GPT-4.1 being 2x faster than GPT-4o at basically the same reasoning capabilities.

elias_t · 2025-04-14T17:06:34 1744650394

Does someone have the benchmarks compared to other models?

cbg0 · 2025-04-14T17:16:07 1744650967

claude 3.7 no thinking (diff) - 60.4%

claude 3.7 32k thinking tokens (diff) - 64.9%

GPT-4.1 (diff) - 52.9% (stat is from the blog post)

https://aider.chat/docs/leaderboards/

meetpateltech · 2025-04-14T17:14:20 1744650860

GPT-4.1 Pricing (per 1M tokens):

gpt-4.1

- Input: $2.00

- Cached Input: $0.50

- Output: $8.00

gpt-4.1-mini

- Input: $0.40

- Cached Input: $0.10

- Output: $1.60

gpt-4.1-nano

- Input: $0.10

- Cached Input: $0.025

- Output: $0.40

minimaxir · 2025-04-14T17:19:01 1744651141

The cached input price is notable here: previously with GPT-4o it was 1/2 the cost of raw input, now it's 1/4th.

It's still not as notable as Claude's 1/10th the cost of raw input, but it shows OpenAI's making improvements in this area.

rvz · 2025-04-14T17:09:48 1744650588

The big change about this announcement is the 1M context window on all models.

But the price is what matters.

croemer · 2025-04-14T17:19:21 1744651161

Nothing compared to Llama 4's 7M. What matters is how well it performs with such long context, not what the technical maximum is.

polytely · 2025-04-14T17:11:20 1744650680

It seems that OpenAI is really differentiating itself in the AI market by developing the most incomprehensible product names in the history of software.

croes · 2025-04-14T17:12:21 1744650741

They learned from the best: Microsoft

amarcheschi · 2025-04-14T17:24:57 1744651497

GpTeams Classic

nivertech · 2025-04-14T17:18:27 1744651107

GPT 4 Workgroups

pixl97 · 2025-04-14T17:15:19 1744650919

"Hey buddy, want some .Net, oh I mean dotnet"

jfoster · 2025-04-14T17:18:10 1744651090

I wonder how they decide whether the o or the digit needs to come first. (eg. o3 vs 4o)

oidar · 2025-04-14T17:07:00 1744650420

I need an AI to understand the naming conventions that OpenAI is using.

bakugo · 2025-04-14T17:05:16 1744650316

> We will also begin deprecating GPT‑4.5 Preview in the API, as GPT‑4.1 offers improved or similar performance on many key capabilities at much lower cost and latency. GPT‑4.5 Preview will be turned off in three months, on July 14, 2025, to allow time for developers to transition.

Well, that didn't last long.

WorldPeas · 2025-04-14T17:09:17 1744650557

so we're going back... .4 of a gpt? make it make sense openai..

（评论） (comments)

（评论）
(comments)