Images & multimodal: generate and read visuals
Goal. Use Gemini both ways with visuals — generate a figure or diagram from a description, and have it read and explain an image you already have.
Generate a clean, labelled schematic diagram of the central dogma (DNA → RNA → protein) suitable for a lecture slide, with clear arrows and no decorative clutter.
Type the request; Gemini generates the image inline. Image generation is available on the free tier.
- 1Generate an image from a description. Gemini's image model is Nano Banana 2, with improvements in world knowledge, character consistency across multiple images, local edits, and text rendering inside the picture — which matters for labelled scientific diagrams and infographics.
- 2Read an image you upload. Hand Gemini a microscopy image, a gel photo, or a figure and ask it to describe what it sees, read embedded text, or flag anomalies — using the same Google Lens technology that reads text in pictures.
- 3Trick: edit, don't regenerate. Ask for a local change — "keep everything but relabel the third arrow 'translation' and make the background white" — instead of regenerating from scratch. Local edits preserve the parts that were already right.
You'll see. A labelled diagram generated from one sentence, plus a description of an image you uploaded — Gemini working as both an image maker and an image reader.
Cost. Image generation and editing are on the free tier (downloads at 1K resolution). Google AI plans add 2K-resolution downloads and the higher-detail Nano Banana Pro regeneration for text-heavy images and infographics.
Takeaway. Gemini is multimodal both directions — generate labelled visuals with Nano Banana 2, and upload real lab images for it to read and explain. Edit locally rather than regenerating.