Turn any snapshot into a Japanese-style diary cover — 30 seconds

Turn any snapshot into a Japanese-style annotated journal

Upload anything — food, coffee, a street corner, your pet, your desk.
AI adds white-line outlines and tiny handwritten thoughts to every object, turning a forgettable photo into your diary cover.

Free credits for new users — upload a photo and play

What is handwritten photo annotation?

Handwritten photo annotation (Japanese: 手描き吹き出し) is a photo-doodle aesthetic that spread from Japanese Instagram in late 2025: take any everyday snapshot and overlay a thin hand-drawn layer of object outlines plus tiny diary-style notes. This page packages the effect into a tuned prompt and 5 ready-to-use scene templates — upload any photo and hit generate.

Adaptive line color

White strokes on dark backgrounds, dark strokes on light ones — the model picks the contrast automatically. The original viral prompt only worked on dark night-market scenes; we fixed that.

Diary-style micro-thoughts

"so refreshing~", "soft and warm", "kind of happy today" — short inner-monologue lines, not dry product labels. The model writes them as if you're whispering to your journal.

8–12 annotations per image

The prompt explicitly asks for 8–12 distinct elements to be annotated. The result is dense and journal-rich, not three sparse labels on an empty image.

Original photo preserved

The prompt instructs the model to add an annotation layer only — your face, food, and composition stay exactly as shot. The model only draws on top.

Six scene families that work best

This effect thrives on object-rich scenes. The more discrete things in the frame, the more the AI has to label.

Food check-in

Brunch plates, hot pot, milk tea, café desserts. Every dish gets a texture note, every drink a temperature & mood note — 'soft', 'just sweet enough', 'so refreshing~'.

Café / desk

Coffee cup, laptop, plant, notebook, headphones. A natural multi-object setup. Notes lean wistful: 'coffee's gone cold', 'cat's still asleep', 'this book is hard to get into'.

Street / night market

Neon signs, food stalls, crowds, menus. The original viral context — neon backgrounds make white pen lines pop with maximum contrast.

Pet portraits

Cats, dogs, toys, blankets. Pet annotations are extra cute — 'soft ears', 'sleepy eyes', 'tiny bit better behaved today~'.

Travel diary

Hotel room, airline meal, train station, scenic spots. Turn a roll of trip photos into a hand-bound journal where every shot captures a passing mood.

Convenience store / supermarket

Shopping cart, snack shelves, fridge cases. Mundane scenes loaded with annotation potential — every snack gets a name and a tiny opinion.

5 scene templates — upload, copy, generate

Each card maps to a typical input photo. Click to open the full prompt, then hit "Use this scene" to push it into the generator above. Upload your own photo and hit generate.

Where you can use the output

Annotated photos are natural content material — a regular snapshot becomes a publish-ready image-text card.

Instagram cover

Portrait + handwritten captions are exactly what social cover formats love. The annotation already carries the caption — no extra design or typography needed.

Instagram Story

9:16 portrait output drops straight into IG Story / TikTok vertical. Looks more elevated than IG's built-in stickers, more handcrafted than Canva templates.

Personal journal post

Drop into your blog, Substack, or photo dump. The annotations already convey emotion — your accompanying caption can be a single line.

Newsletter / blog illustration

Food blogs, city walking essays, lifestyle newsletters — this style fills the 'human-feeling illustration' gap that stock photos can't.

What this page gives you

GPT Image 2 + a tuned prompt + 5 scene templates + transparent credits. No prompt-writing or design skills required to reproduce the original viral feel.

Powered by GPT Image 2

Best-in-class for in-image text rendering. At high quality + larger sizes, handwritten English / Japanese / Chinese characters render legibly — the only model that makes this style work.

Prompt pre-filled

Upload a photo and hit generate immediately, no prompt-writing needed. Want a different style / language / density? Copy any of the 5 scene templates and overwrite.

Faces and composition preserved

The prompt explicitly tells the model not to modify faces or the main subject — only add an annotation layer. Your face stays. Food stays. Composition stays.

Up to 8 outputs per run

Run the same photo + same prompt 4–8 times and pick the variant with the most charming notes. The model writes new micro-thoughts each time.

Transparent credits, free for new users

Credit cost is shown above the generate button before you click. New users get free credits — enough to try all 5 scenes before deciding.

Handwritten Annotation — FAQ

Common questions about this prompt and GPT Image 2 generation. Other questions: support@imagesv2.ai.

No. The prompt explicitly says 'do not modify faces or the main subject — only add an annotation layer'. The model draws around faces, not on them. If you're cautious, pick a photo where faces are small or off-frame.


Three traits: (1) Object-rich — 8+ labelable elements (a meal, a street, a desk). (2) Background contrast — dark neon or light wood both work; flat white walls don't. (3) Already-clear composition. The original viral hits were Changsha night markets, which checked all three boxes.


Yes. The 5 scenes ship with English prompts. For Japanese annotations, swap the [Text rules] block to 'Japanese hiragana / katakana text only'. For Chinese, switch to '简体中文 only, 汉字工整'. The Chinese version of this page has Chinese-language scene prompts ready to copy.


GPT Image 2 at medium quality + 1024×1024 occasionally renders pseudo-characters (looks like text, isn't). Fix: (1) bump quality to high in the generator, (2) pick 1024×1536 or 1536×1024 for larger glyphs, (3) regenerate 2–3 times and pick the cleanest variant.


The phrase 'Annotate 8–12 distinct elements' in the prompt is tunable. Change to '15–20 elements' for denser, or '3–5 key elements' for sparser. The model responds reliably to count instructions.


Scenes 1–3 (food, café, convenience store) show English-handwriting demos. Scenes 4–5 (indoor, pet) show Chinese-handwriting demos — the layout, density, and decoration patterns read the same either way. Click "Use this scene" and the prompt is copied in your own language.


Upload a snapshot and try it

Free credits for new users. Prompt pre-filled. Upload your photo → hit generate → walk away with a Japanese-style annotated diary cover.