使用“底稿”以确保文本和数字的准确性。
Using "underdrawings" for accurate text and numbers

原始链接: https://samcollins.blog/underdrawings/

一种名为“底稿法”的新技术,显著提高了人工智能生成图像中文字和数字的准确性,甚至超越了最近的进展,如ChatGPT-Images-2和Gemini 3.0 Pro。该方法利用了不同类型人工智能的优势:确定性工具用于精确性,生成模型用于艺术性。 它包含一个两步过程。首先,使用代码生成人工智能(如Claude)创建包含精确文本/数字的轮廓(“底稿”),并将其导出为图像(例如,SVG)。其次,将此底稿*连同*风格提示一起输入到图像生成模型(如Gemini),指示它在准确的基础之上“绘制”所需的视觉风格。 这种方法克服了人工智能难以准确描绘数字和文本的常见问题。虽然并非万无一失,但该方法明显产生更可靠的结果,作者认为这是一个足够简单的概念,很快将被集成到标准的图像生成工具中。

对不起。
相关文章

原文

I discovered a technique for generating reliable text and numbers in AI generated images.

For example, the following image is considered impossible with state of the art image models. But I made this with Gemini 3.0 Pro (plus one extra step I’m going to explain below).

ChatGPT-Images-2 which released earlier this week does a great job with accurate text and numbers.

So I had assumed this technique was now moot but no! This method still works better than Gemini 3.0 Pro and ChatGPT-Images-2 (see below).

That’s surprising to me. But this is such a simple technique, I’m sure they’ll add something like it soon.


The Underdrawing Method

I’m totally naming it like it’s a thing but it does seem to be a thing

Example

It is easiest to see if we do a baby A/B test - to show the effect with and without.

Let’s pick a simple prompt that gemini and chatgpt will get the numbers wrong on. They get a lot of text and numbers right these days, so we have to go fairly hard.

Make an image of a game board with 50 stepping stones arranged in a spiral, winding counter-clockwise inward from start at the outside (1) to finish at the centre (50). Each stone is clearly numbered consecutively from 1 to 50. Style: claymation diorama, studio-lit, candy-bright, soft bokeh background.

❌ Gemini 3 Pro (without underdrawing)

As expected. Impressive at first glance but falls apart once you start reading.

❌ ChatGPT Images 2 (without underdrawing)

I was so impressed with ChatGPT-Images-2 release I expected it to get this. Very surprising to see it fail similar to Gemini.

✅ Gemini 3.0 Pro (with the underdrawing method)

Bingo. Correct numbers, correct number and sequencing of buttons, correct spiral shape

So how did we do that? One pre-step.

There will be far more intelligent and elaborate ways to do this. This was a quick method I came up with one day while trying to generate an image of a 100-step adventure board for my kid.

Use deterministic and generative machines for what they’re good at

  1. SVG/HTML will make dry visuals but with excellent math and precision
  2. Image Gen models will create stunning visuals but with famously unreliable math/text

So I spent an afternoon figuring out how the genius analyst and the genius artist could work together. Well, obviously Claude did all the work (thank you Claude), but I had some ideas and helped with reading.

”Give it an outline. Ask it to paint on top”

  1. Layer 1: The “underdrawing” (deterministic): Layout the numbers and text in the correct positions and orientations in whatever language/format you prefer (svg, python, mermaid) — you just need to export an image of it with the pixels of the numbers/text.

  2. Layer 2: The “painting” (generative): Make an image generation request to Gemini 3.0 Pro or greater (you just need image+text input support) where you’ll include your underdrawing and the prompt for the visual style you want.

Example

Step 1 of 2: generate the numbers/text outline with SVG

Ask Claude code to generate it for you, and iterate until you’re happy with wireframe version

Make an SVG of 50 stepping stones arranged in a spiral, winding counter-clockwise inward from start at the outside (1) to finish at the centre (50), each stone numbered consecutively from 1 to 50. Each stone is a different shape: circle, square, triangle, hexagon.

Step 2 of 2: Pass the underdrawing + prompt to image generation

Ask Claude to provide the SVG you made in the prior step to Gemini Pro and transform it visually without changing the numbers, e.g.

Transform this image into a photographed claymation diorama of assorted artisan chocolates and candies, arranged in a spiral path winding counter-clockwise inward from start (1) at the outside to finish (50) at the centre, viewed from a low-angle tilted perspective. 

That’s it

It isn’t hard. By now claude code or codex can do every step of that for you.

Note also that it won’t be perfect every time. Thank you for the reality check, 71.

联系我们 contact @ memedata.com