我们运行了超过600次图像生成来比较AI图像模型。
We ran over 600 image generations to compare AI image models

原始链接: https://latenitesoft.com/blog/evaluating-frontier-ai-image-generation-models/

## AI图像生成模型比较:深度分析 LateNiteSoft,Camera+等iOS照片应用开发者,进行了广泛的测试——超过600次生成——以确定最适合各种图像编辑的AI模型。由于需要一种可持续的按使用付费计费系统(CreditProxy),而不是依赖免费层级,他们评估了OpenAI的gpt-image-1、Gemini和Seedream。 他们的测试侧重于模拟典型用户请求的提示词:经典滤镜、风格变化(水彩、动漫)和生成式编辑(添加传送门、未来主义元素)。结果表明,**没有一个模型在所有方面都表现出色。** **OpenAI** 在创意性、变革性编辑(如风格迁移)方面表现出色,但经常会引入“AI瑕疵”和细节扭曲。**Gemini** 擅长逼真的编辑,保留细节,但可能过于保守,有时会拒绝编辑,尤其是在人像方面。**Seedream** 提供了一种平衡,在各个类别中表现良好,并可能提供具有成本效益的OpenAI替代方案。 各模型的生成时间相对一致。LateNiteSoft正在探索“提示词分类器”,以根据用户请求自动选择最佳模型,并计划将CreditProxy作为一项服务提供。他们的完整比较,包括大量的图像示例,可供详细查阅。

最近对人工智能图像生成模型——OpenAI、Gemini 和 Seedream——的比较,涉及超过 600 张图像的生成,揭示了每个模型独特的特点。OpenAI 被指出经常改变面部特征,有时会使其趋向于“平均”容貌,并且有一致的黄色色调。Gemini 经常拒绝应用编辑,特别是对人脸,而 Seedream 独特地以 4K 分辨率输出图像。 评论员观察到,即使在明确要求不修改的情况下,OpenAI 也倾向于修改细节,这可能与“安全”措施有关。Midjourney 很容易通过其细胞阴影风格来识别。虽然有些人喜欢 OpenAI 的整体美感,但另一些人认为它过于浓重。 讨论还涉及了人工智能时代艺术家和插画家的未来,观点从人工智能作为生产力工具到潜在的就业岗位流失不等。人们越来越有兴趣将人工智能*整合*到现有的创作流程中,而不是完全生成图像,并希望有更多用户友好的本地图像生成软件。
相关文章

原文

tl:dr; We’ve been making photo apps for iOS for long enough that we have gray hairs now, and using our experience we ran over 600 image generations to compare which AI models work best for which image edits. If you want, you can jump right to the image comparisons, or the conclusion, but promise us you won’t presumptuous comments on Hacker News until you’ve also read the background!

Background

Hi! We’re LateNiteSoft, and we’ve been working on photography-related iOS apps for 15 years now. Working on market-leading apps such as Camera+, Photon and REC, we’ve always had our finger on the pulse on what users want out of their mobile photography.

With the ground-breaking release of OpenAI’s gpt-image-1 image generation model earlier this year, we started investigating all the interesting use cases we could think of for AI image editing.

But as a company that has never taken any venture capital investment, all our products have to pay for themselves. We’re in it to delight our users, not just capture market share and sell them out. When considering AI projects, one thing has been clear – we can’t take the AI startup road where you have a generous free tier, charge an unreasonably small monthly fee for “unlimited”, and hope you’re going to make it up on scale (code for “someone please acquire us”).

All the AI-focused billing systems we could find out there were based on this. Assuming you want to claim unlimited access, and then sandbag users with “fair use” clauses and prevent them from any actual unlimited usage (which is, obviously, untenable, since you’ll end up with one $20/mo user reselling to everyone else).

Since we want to fairly charge our customers for what they actually use, we’ve built a credit-based “pay per generation”-style billing system (that internally we’ve been calling CreditProxy). We’ve also been planning on providing this as a service, since nobody else seems to be doing it, so if you’re interested in being a trial user, get in touch!

We released our app MorphAI as a public proof of concept to give CreditProxy a proper real world-test, and have marketed it to the users of Camera+, which includes traditional photo-editing functionality, including a whole host of popular photo filters, giving us a built-in audience of customers ready for the next step in image editing.

With the release of newer models like nanoBanana and Seedream, we’ve had to consider which models make sense to support. We need to explore the trade-offs between quality, prompt complexity, and pricing.

A couple of hastily-hacked together scripts, and many, many AI generation credits later, we have some results! So that everyone else also doesn’t have to waste their money, we figured we’d share what we found:

The Tests

Based on our experience with Camera+ and the kind of edits our users have been making with MorphAI, we picked a host of somewhat naive prompts. Veteran Midjourney users may scoff at these, but in our experience these are the kinds of prompts that our average user is likely to use.

As for test photos, we chose some some representative things people like to take photos of: their pets, their kids, landscapes, their cars, and product photography.

Image generation times are also relevant. During our test period, the generation time for all models was fairly consistent, and didn’t vary by image or prompt complexity.

OpenAI (High)GeminiSeedream
80 seconds11 seconds9 seconds

OpenAI also has a quality setting, the images included here were all generated on High quality, but we also tested Medium, and those generations averaged 36 seconds. We can include the Medium quality images as well if there is any interest!

There are a ton of photos to compare here, so to make things easier to flip through, here are some keyboard shortcuts to help you out: Click on a photo to see it larger. Now you can use the arrow keys to switch between models. Press the tab key to switch between test images. Hit ESC to leave the view.

Classic filters

These are the types of filters that we used to implement manually, by painstakingly hand-crafting textures and Photoshop layers and then converting those to Objective-C code. Now all you need is a few words into a language model (and to burn down half of a rain forest or so; just the cost of progress).

Our conclusion for this category is that for photo realistic filters like this, Gemini really shines by preserving details from the original and minimizing hallucinations, but often at the expense of the strength and creativity of the effect. Especially with photos of people, Gemini seems to refuse to apply any edits at all, with a strong bias towards photo realism.

OpenAI really likes to mess with the details of the photo, giving a characteristic “AI slop” feel, which can be a deal breaker on things like human faces.

Grungy vintage photo

Use soft, diffused lighting

Transform into a kaleidoscopic pattern

Gemini took some really odd shortcuts in generating some of these!

Apply a heat map effect

It’s clear that none of the models actually have a concept of what generates heat here, aside from Seedream knowing that humans generate heat, clearly revealing that without any ground truth the models struggle.

Make it look like a long exposure photograph

This is an interesting test since in some of the sample photos a long exposure doesn’t make sense. In the ones where it makes the most sense – the landscape and the car, OpenAI did the best, but on the other hand it completely messed up the cats and the product, and the portrait photo turned into a trippy art piece.

Gemini, maybe logically, did nothing. Seedream liked adding light streaks as if a car drove past, with only the portrait photo seemingly making any sense.

Pinhole camera

In this case, it was funny to watch Gemini take a literal approach and generate actual pictures of cameras! For this reason we re-worked this prompt by just adding the word “effect”.

Pinhole camera effect

Gemini liked to generate a literal pinhole camera here so we tried modifying the prompt.

Add a layer of fog or mist

Make it look like it’s golden hour

Make it look like it’s etched in glass

With this prompt, there is ambiguity in what “it” is, so we tried a reworded prompt as well. Only OpenAI consistently knew what a traditional etched glass effect looks like. Seedream’s glass item effect looks really cool!

Make it look like the photo is etched in glass

Gemini has a really interesting interpretation here! And Seedream had some pretty fantastic results.

Remove background

This is a classic job people have spent their lives doing manually in Photoshop since the early 90’s. But what is a “background”, really? Is the ground in front of a car the “background”? We also retried this with a tweaked prompt.

OpenAI’s “sloppification” of the details of objects makes it useless for this purpose.

Isolate the object

With the tweaked prompt, Gemini’s API actually returned a followup question: “Which object would you like to isolate? There are two cats in the image.”, which our generation script was not prepared to handle! So it is missing from this comparison.

Give it a metallic sheen

Another case where “it” is vague and we can retry with a more specific prompt. The product imagery is another case where Seedream created a really stunning result, even adding a reflection of someone taking the photo with their phone!

Give the object a metallic sheen

Modifying the prompt here really only changed OpenAI’s interpretation.

Lens effects

One of the filter packs we had worked on for Camera+ using traditional methods was a lens effect filter pack. But unlike traditional edits, with generative AI you can also create wide-angle lens effects that can just make up the portions of the image that the camera couldn’t capture.

This is another category where it’s very visible how OpenAI regenerates and hallucinates all the details in a picture, where Gemini and Seedream’s results are very faithful to the original and look more like actual lens permutations.

Apply a fish-eye lens effect

Strong bokeh blur

It was pretty surprising how poorly the models did here considering how common this must be among the training data. OpenAI give a strong blur but no bokeh effects. Gemini gives us a bunch of random circles in front of the image, demonstrating an understanding of what people want out of a bokeh filter but not how it works. Seedream does really well here.

Apply a Dutch angle (canted frame)

OpenAI really lost it’s mind here on the car photo.

Change to a bird’s-eye view

Style transfer

Style Transfer is the process of applying an artistic style to a photo. This technique predates the current AI model by quite a few years with popular apps generating Van Gogh paintings out of your photos. We were also early out in attempting style transfer for our apps, shout out to Noel’s Intel iMac which had to run at full blast all night just to generate a 256x256px image, since it was our only machine with a compatible GPU.

While Gemini was good at preserving reality in the more photorealistic effects in the previous section, when it comes to the more artistic styles, OpenAI has them beat, while Gemini keeps things far too conservative, especially with photos of a human in them, where it sometimes seems to just do nothing at all, is this some kind of safety guardrail?

Draw this in the style of a Studio Ghibli movie

ChatGPT went viral with this prompt, with Sam Altman even making it his profile on X. And OpenAI keeps the crown – is Google too conservative in order to avoid a lawsuit? Seedream makes an attempt but they just end up looking like “generic Anime”.

Transform into watercolor painting

Make it look like a pastel artwork

Transform into Art Nouveau style

Apply a ukiyo-e Japanese woodblock print style

A very stark example of Gemini failing to apply a style on photos with humans. This is a prompt where Seedream knocked it out of the park, perhaps showing a larger portion of their training data being sourced from asian cultures than the western models.

Transform into low poly art

Seedream blows everyone else away here.

Portrait effects

For prompts about human appearance, we have only applied them to the portrait photo.

Make it look like a caricature

Seedream seems to be biased towards asian culture, giving an anime look instead of a western-style cartoon caricature.

Turn them into an action figure in the blister pack

OpenAI’s style here went viral a while back, but Gemini is stunningly realistic. Seedream is a weird mix of realistic and hallucinations.

Generative edits

The place where generative AI really shines is when it can show off some creativity, and these were some prompts we added as suggestions in MorphAI to showcase that and inspire our users. OpenAI still seems to win here.

Create a 70’s vinyl record cover

This is an example of a prompt that has a small viral moment with OpenAI, but the other models can’t even get the aspect ratio right.

Introduce mythical creatures native to this environment

This one showcases OpenAI’s creativity. Gemini seems kind of creepy?

Add a mystical portal or gateway

Gemini replacing the face with a portal is certainly a choice!

Incorporate futuristic technology elements

Another example of OpenAI being far more creative and willing to re-do the whole image.

Make it look whimsical and enchanting

This one also shows OpenAI being more artistic, and Gemini being more realistic while still trying to incorporate the prompt.

Transform the scene to a stormy night

Conclusion

If you made it all the way down here you probably don’t need a summary, but for our purposes, we’ve at least concluded that there is no one-size-fits all model at this point.

OpenAI is great for fully transformative filters like style transfer or more creative generative applications, whereas Gemini works better for more realistic edits. Seedream lies somewhere in the middle and is a bit of a jack of all trades, and for the price and performance may be a good replacement for OpenAI.

We’ve been experimenting on working on a “prompt classifier” to automatically choose a model – sending artistic prompts to OpenAI and more realistic prompts to Gemini, if there’s any interest we can follow up with how that worked out!

Methodology

Tests were performed on October 8 with gpt-image-1, gemini-2.5-flash-image and seedream-4-0-250828.

Timings were measured on a consumer internet connection in Japan (Fiber connection, 10 Gbps nominal bandwidth) during a limited test run in a short time period.

联系我们 contact @ memedata.com