GPT Image 2 Text in Images: What Works & What Doesn’t

For five years, the running joke about AI image generation was simple: it can paint anything except letters. Posters had garbled headlines. Business cards had nonsense names. Infographics looked like they were written in Wingdings.

GPT Image 2 is the first model where that joke stops being true — most of the time. Not always. There are patterns where it still fails, and patterns where it’s reliable enough to ship a finished asset on the first try. This post is the practical breakdown of which is which.

Why text in images was hard, briefly

Earlier models like DALL·E 3 and Midjourney were trained on images plus captions, but they treated letters as visual shapes rather than meaningful symbols. They’d learn that posters tend to have something that looks like a word at the top — and they’d invent something letter-shaped that wasn’t a word.

GPT Image 2 is trained differently: the model has a much stronger grasp of glyphs, spelling, and how text behaves in real layouts. That doesn’t make it magic — it just means short, clear, specific text instructions now produce real text.

What works reliably

The patterns we’ve seen succeed on the first or second generation:

Short headlines. 4–12 words. Movie posters, event flyers, billboard ads, store signs. Things like “Summer Sale — 30% Off” or “Now Open” come out clean almost every time.

Two-line layouts. A big headline plus a smaller subtitle — the most common poster pattern — works extremely well. Specify both lines clearly and the model handles hierarchy without help.

Multilingual text. This is one of GPT Image 2’s genuinely strong points. Chinese 新年快乐, Japanese 春の桜, Korean 환영합니다, Arabic مرحبا — these render correctly, alongside English in the same image. Most other models can’t do this at all.

Labeled diagrams and infographics. “Step 1 / Step 2 / Step 3” cards, simple bar charts with axis labels, anatomy diagrams with arrow callouts. The labels stay readable and in the right place.

Stylized typography. Vintage neon signs, hand-painted café boards, retro arcade screens, comic book speech bubbles. If you describe the typographic style clearly, the model matches it.

What still fails (be ready)

The honest list:

Paragraphs. Anything longer than ~40 words inside an image still degrades. Use captions outside the image instead, or stitch two generations together.

Tiny text. A serial number on the side of a product, microscopic legal disclaimers — these still come out as letter-shaped noise.

Exact brand reproductions. It can make a logo. It cannot reproduce your specific logo exactly. Use the editing flow to paste your real logo in, or use it as a reference image.

Tables with many rows. A 3-row comparison table works. A 15-row spreadsheet doesn’t. The model loses cell alignment.

Numbers in complex layouts. Stock charts, dense data tables, financial summaries — error rate climbs fast. For data viz, generate a clean background and add the chart in your design tool.

Prompt patterns that work

A few that we use over and over.

Pattern 1: explicit quotes for the exact text

Always put the exact words in quotes. Don’t describe what the text “should say” in a paraphrase — say it literally.

A vintage diner sign hanging by chains, "OPEN 24 HOURS" in red neon lettering, dark city street at night, soft rain reflections on the pavement.

Pattern 2: position cues for layouts

Tell the model where each element goes. Top, center, bottom, lower-right. The model respects these almost every time.

A movie poster, title "Journey to the Stars" at the top in bold serif white text, subtitle "Coming Summer 2026" at the bottom, central image of a lone astronaut looking up at a swirl of galaxies.

Pattern 3: language hints for multilingual content

For non-English text, say what language it is. The model will choose appropriate glyphs and styling.

A Chinese New Year greeting card, large red Chinese calligraphy "新年快乐" centered, smaller English subtitle "Happy New Year" below in elegant gold script, festive plum-blossom background.

Pattern 4: typography style direction

Treat the font like part of the description. “Vintage hand-painted,” “sharp tech sans-serif,” “Art Deco metallic,” “neon tube lettering,” “arcade pixel.” The model interprets these well.

A modern conference badge, name "Sarah Chen" in clean sans-serif uppercase, role "Senior Designer" in smaller italic below, simple navy and white color scheme, minimalist layout.

Pattern 5: edit instead of regenerate

If the layout is right but a word is wrong, don’t regenerate from scratch — use the edit feature, point to the bad text, and rewrite it. You keep the look you liked and only fix the broken letters.

Real workflows that use this

Where in-image text actually changes the day-to-day:

Social media managers stop bouncing back to Canva for every post. One tool, one prompt, done.
E-commerce teams generate localized ad creative — same product, twelve languages of headline — in minutes instead of days.
Creators make manga panels with dialogue in the bubbles, slide decks with titled diagrams, and YouTube thumbnails with the punchy text actually readable.
Localization teams preview how a marketing campaign will look across markets before paying a designer to redo it twelve times.

Try it on something you actually need

The fastest way to get a feel for this is to take a real image you’re about to make — a thumbnail, a banner, a flyer — and run it through the imagesv2 playground. Use the patterns above. Two or three rounds and you’ll know exactly where GPT Image 2 fits in your workflow.

If you want to validate without commitment, 1,000 credits for $14.90 is enough for several dozen experiments. No subscription, credits never expire.

Generating Readable Text in Images with GPT Image 2 — What Works, What Doesn’t

Table of Contents