Images & multimodal: generate and read visuals

LESSONLesson 4 · ~15 min

🎯Goal. Use Gemini both ways with visuals — generate a figure or diagram from a description, and have it read and explain an image you already have.

▶ Try this prompt

Generate a clean, labelled schematic diagram of the central dogma (DNA → RNA → protein) suitable for a lecture slide, with clear arrows and no decorative clutter.

Type the request; Gemini generates the image inline. Image generation is available on the free tier.

Steps

1Generate an image from a description. Gemini's image model is Nano Banana 2, with improvements in world knowledge, character consistency across multiple images, local edits, and text rendering inside the picture — which matters for labelled scientific diagrams and infographics.
2Read an image you upload. Hand Gemini a microscopy image, a gel photo, or a figure and ask it to describe what it sees, read embedded text, or flag anomalies — using the same Google Lens technology that reads text in pictures.
3Trick: edit, don't regenerate. Ask for a local change — "keep everything but relabel the third arrow 'translation' and make the background white" — instead of regenerating from scratch. Local edits preserve the parts that were already right.

✓You'll see. A labelled diagram generated from one sentence, plus a description of an image you uploaded — Gemini working as both an image maker and an image reader.

💳Cost. Image generation and editing are on the free tier (downloads at 1K resolution). Google AI plans add 2K-resolution downloads and the higher-detail Nano Banana Pro regeneration for text-heavy images and infographics.

💡Takeaway. Gemini is multimodal both directions — generate labelled visuals with Nano Banana 2, and upload real lab images for it to read and explain. Edit locally rather than regenerating.

How was this lesson?