Images and multimodal: see and create

LESSONLesson 4 · ~15 min

🎯Goal. Use ChatGPT both ways with images — have it interpret a figure or photo you upload, and generate a diagram or illustration from a description.

▶ Try this prompt

Generate a clean, labelled schematic of the central dogma (DNA → RNA → protein) suitable for a lecture slide: simple, high-contrast, with arrows and labels. Then suggest a one-line caption.

Send this to generate an image. To go the other way, upload a photo of a gel or a chart and ask "what does this figure show, and what stands out?" — that uses ChatGPT's vision to read the image.

Steps

1Generate from a description. Describe the image you want — content, style, and where it'll be used (slide, poster, figure). Iterate: "make the arrows thicker", "remove the background".
2Interpret what you upload (vision). Attach a photo, chart, gel, or microscopy image and ask ChatGPT to describe or interpret it. Good for a first read; not a substitute for your own quantification.
3Stay honest about generated images. AI-generated figures are illustrations, not data. Never present a generated image as an experimental result.

✓You'll see. A labelled schematic created from your description, and — when you upload one — a plain-language read of a figure or photo.

💳Cost. Image generation and vision both work on the free tier but are limited (slower, fewer images). Plus gives more and higher-quality image creation; Pro gives unlimited, faster generation.

💡Takeaway. ChatGPT is multimodal both directions — it can read images you upload and create new ones — but generated images are illustrations, never data.

How was this lesson?