(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=43683410

Hacker News 正在讨论 OpenAI 的 GPT-4.1,该模型目前仅通过 API 提供。Tiberium 指出,与不断更新的 ChatGPT 版本(chatgpt-4o-latest)不同,GPT-4.1 的目标是提供稳定性。Minimaxir 反驳说,API 允许锁定特定版本(例如,gpt-4o-2024-05-13)。 Codingwagie 为 OpenAI 的数字命名系统进行了辩护,反驳了相关的批评。然而,Philpax 质疑其清晰度,并询问在没有文档的情况下如何根据能力对不同的 GPT 模型进行排名。 关于 GPT-4.1 与其他模型(如 Claude 3.7 和 Gemini 2.5 Pro)的性能比较存在争议,讨论中引用了不同的基准测试结果。一些人推测,这是 OpenAI 在编码性能方面采取的防御性措施。Runako 质疑其与 ChatGPT 中的 o3-mini-high 相比的价值。 用户批评 OpenAI 的命名约定混乱。讨论还涉及模型的定价、上下文窗口大小以及整体价值主张。一些用户对 GPT-4.1 未在 ChatGPT 中可用感到沮丧。

相关文章
  • (评论) 2025-02-28
  • (评论) 2025-03-25
  • 新模型和开发者产品 2023-11-07
  • (评论) 2025-03-19
  • (评论) 2025-04-14

  • 原文
    Hacker News new | past | comments | ask | show | jobs | submit login
    GPT-4.1 in the API (openai.com)
    93 points by maheshrijal 26 minutes ago | hide | past | favorite | 51 comments










    Very important note:

    >Note that GPT‑4.1 will only be available via the API. In ChatGPT, many of the improvements in instruction following, coding, and intelligence have been gradually incorporated into the latest version

    If anyone here doesn't know, OpenAI does offer the ChatGPT model version in the API as chatgpt-4o-latest, but it's bad because they continuously update it so businesses can't reliably rely on it being stable, that's why OpenAI made GPT 4.1.



    OpenAI (and most LLM providers) allow model version pinning for exactly this reason, e.g. in the case of GPT-4o you can specify gpt-4o-2024-05-13, gpt-4o-2024-08-06, or gpt-4o-2024-11-20.

    https://platform.openai.com/docs/models/gpt-4o



    > chatgpt-4o-latest, but it's bad because they continuously update it

    Version explicitly marked as "latest" being continuously updated it? Crazy.



    So you're saying that "ChatGPT-4o-latest (2025-03-26)" in LMarena is 4.1?


    No, that is saying that some of the improvements that went into 4.1 have also gone into ChatGPT, including chatgpt-4o-latest (2025-03-26).


    Who has the numbers vs. Claude Sonnet 4.6 and Gemini 2.5 Pro? And is it available in Cursor yet?


    GPT-4.1 probably is a distilled version of GPT-4.5

    I dont understand the constant complaining about naming conventions. The number system differentiates the models based on capability, any other method would not do that. After ten models with random names like "gemini", "nebula" you would have no idea which is which. Its a low IQ take. You dont name new versions of software as completely different software

    Also, Yesterday, using v0, I replicated a full nextjs UI copying a major saas player. No backend integration, but the design and UX were stunning, and better than I could do if I tried. I have 15 years of backend experience at FAANG. Software will get automated, and it already is, people just havent figured it out yet



    > The number system differentiates the models based on capability, any other method would not do that.

    Please rank GPT-4, GPT-4 Turbo, GPT-4o, GPT-4.1, GPT-4.5, o1-mini, o1, o1 pro, o3-mini, o3-mini-high, o3, and o4-mini in terms of capability without consulting any documentation.



    Btw, as someone who agrees with your point, what’s the actual answer to this?


    There's no single ordering -- it really depends on what you're trying to do, how long you're willing to wait, and what kinds of modalities you're interested in.


    Very easy with the naming system?


    Really? Is o3-mini-high better than o1-pro?


    > You dont name new versions of software as completely different software

    macOS releases would like a word with you.

    https://en.wikipedia.org/wiki/MacOS#Timeline_of_releases

    Technically they still have numbers, but Apple hides them in marketing copy.

    https://www.apple.com/macos/macos-sequoia/

    Though they still have “macOS” in the name.



    Feel free to lay the naming convention rules out for us man.


    Just add SemVer with an extra tag:

    4.0.5.worsethan4point5



    > Yesterday, using v0, I replicated a full nextjs UI copying a major saas player. No backend integration, but the design and UX were stunning, and better than I could do if I tried.

    Exactly. Those who do frontend or focus on pretty much anything Javascript are, how should I say it? Cooked?

    > Software will get automated

    The first to go are those that use JavaScript / TypeScript engineers have already been automated out of a job. It is all over for them.



    Yeah its over for them. Complicated business logic and sprawling systems are what are keeping backend safe for now. But the big front end code bases where individual files (like react components) are largely decoupled from the rest of the code base is why front end is completely cooked


    > I don't understand the constant complaining about naming conventions.

    Oh man. Unfolding my lawn chair and grabbing a bucket of popcorn for this discussion.



    > using v0, I replicated a full nextjs UI copying a major saas player. No backend integration, but the design and UX were stunning

    AI is amazing, now all you need to create a stunning UI is for someone else to make it first so your AI can rip it off. Not beating the "plagiarism machine" allegations here.



    Heres a secret: Most of the highest funded VC backed software companies are just copying a competitor with a slight product spin/different pricing model


    Exactly, they like to call it “bringing new energy to an old industry”.


    Got any examples?


    No benchmark comparisons to other models, especially Gemini 2.5 Pro, is telling.


    Gemini 2.5 Pro gets 64% on SWE-bench verified. Sonnet 3.7 gets 70%

    They are reporting that GPT-4.1 gets 55%.



    Are those with «thinking» or without?


    Big focus on coding. It feels like a defensive move against Claude (and more recently, Gemini Pro) which became very popular in that regime. I guess they recently figured out some ways to train the model for these "agentic" coding through RL or something - and the finding is too new to apply 4.5 on time.


    ChatGPT currently recommends I use o3-mini-high ("great at coding and logic") when I start a code conversation with 4o.

    I don't understand why the comparison in the announcement talks so much about comparing with 4o's coding abilities to 4.1. Wouldn't the relevant comparison be to o3-mini-high?

    4.1 costs a lot more than o3-mini-high, so this seems like a pertinent thing for them to have addressed here. Maybe I am misunderstanding the relationship between the models?



    It's not the point of the announcement, but I do like the use of the (abs) subscript to demonstrate the improvement in LLM performance since in these types of benchmark descriptions I never can tell if the percentage increase is absolute or relative.


    pretty wild versioning that GPT 4.1 is newer and better in many regards than GPT 4.5.


    I think they're doing it deliberately at this point


    Tomorrow they are releasing the open source GPT-1.4 model :P


    Testing against unspecified other "leading" models allows for shenanigangs:

    > Qodo tested GPT‑4.1 head-to-head against other leading models [...] they found that GPT‑4.1 produced the better suggestion in 55% of cases

    The linked blog post goes 404: https://www.qodo.ai/blog/benchmarked-gpt-4-1/



    > Note that GPT‑4.1 will only be available via the API. In ChatGPT, many of the improvements in instruction following, coding, and intelligence have been gradually incorporated into the latest version (opens in a new window) of GPT‑4o, and we will continue to incorporate more with future releases.

    The lack of availability in ChatGPT is disappointing, and they're playing on ambiguity here. They are framing this as if it were unnecessary to release 4.1 on ChatGPT, since 4o is apparently great, while simultaneously showing how much better 4.1 is relative to GPT-4o.

    One wager is that the inference cost is significantly higher for 4.1 than for 4o, and that they expect most ChatGPT users not to notice a marginal difference in output quality. API users, however, will notice. Alternatively, 4o might have been aggressively tuned to be conversational while 4.1 is more "neutral"? I wonder.



    There's a HUGE difference that you are not mentioning: there are "gpt-4o" and "chatgpt-4o-latest" on the API. The former is the stable version (there are a few snapshot but the newest snapshot has been there for a while), and the latter is the fine-tuned version that they often update on ChatGPT. All those benchmarks were done for the API stable version of GPT-4o, since that's what businesses rely on, not on "chatgpt-4o-latest".


    I disagree. From the average user perspective, it's quite confusing to see half a dozen models to choose from in the UI. In an ideal world, ChatGPT would just abstract away the decision. So I don't need to be an expert in the relatively minor differences between each model to have a good experience.

    Vs in the API, I want to have very strict versioning of the models I'm using. And so letting me run by own evals and pick the model that works best.



    For conversational AI, the most significant part is GPT-4.1 being 2x faster than GPT-4o at basically the same reasoning capabilities.


    Does someone have the benchmarks compared to other models?


    claude 3.7 no thinking (diff) - 60.4%

    claude 3.7 32k thinking tokens (diff) - 64.9%

    GPT-4.1 (diff) - 52.9% (stat is from the blog post)

    https://aider.chat/docs/leaderboards/



    GPT-4.1 Pricing (per 1M tokens):

    gpt-4.1

    - Input: $2.00

    - Cached Input: $0.50

    - Output: $8.00

    gpt-4.1-mini

    - Input: $0.40

    - Cached Input: $0.10

    - Output: $1.60

    gpt-4.1-nano

    - Input: $0.10

    - Cached Input: $0.025

    - Output: $0.40



    The cached input price is notable here: previously with GPT-4o it was 1/2 the cost of raw input, now it's 1/4th.

    It's still not as notable as Claude's 1/10th the cost of raw input, but it shows OpenAI's making improvements in this area.



    The big change about this announcement is the 1M context window on all models.

    But the price is what matters.



    Nothing compared to Llama 4's 7M. What matters is how well it performs with such long context, not what the technical maximum is.


    It seems that OpenAI is really differentiating itself in the AI market by developing the most incomprehensible product names in the history of software.


    They learned from the best: Microsoft


    GpTeams Classic


    GPT 4 Workgroups


    "Hey buddy, want some .Net, oh I mean dotnet"


    I wonder how they decide whether the o or the digit needs to come first. (eg. o3 vs 4o)


    I need an AI to understand the naming conventions that OpenAI is using.


    > We will also begin deprecating GPT‑4.5 Preview in the API, as GPT‑4.1 offers improved or similar performance on many key capabilities at much lower cost and latency. GPT‑4.5 Preview will be turned off in three months, on July 14, 2025, to allow time for developers to transition.

    Well, that didn't last long.



    so we're going back... .4 of a gpt? make it make sense openai..






    Join us for AI Startup School this June 16-17 in San Francisco!


    Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



    Search:
    联系我们 contact @ memedata.com