Introducing 4o Image Generation

Experience powerful and practical image creation with our most advanced, natively multimodal model—capable of producing precise, photorealistic visuals.

At OpenAI, we’ve always envisioned image generation as a core function of language models. That’s why GPT‑4o includes our most capable image generator yet—designed to go beyond just beautiful visuals and deliver truly useful ones.

Making Image Generation More Useful

Table of Contents

From ancient cave paintings to today’s infographics, humans have long used images not just to decorate, but to communicate, persuade, and explain. While current generative models can create stunning, surreal scenes, they often fall short when it comes to practical visuals—logos, diagrams, and images that convey precise meaning through symbolic language and shared understanding.

GPT‑4o’s image generation stands out by rendering text accurately, closely following user prompts, and leveraging its built-in knowledge and conversation context. It can even use uploaded images as visual references or transform them creatively. This opens the door to crafting the exact image you have in mind—making visuals a more effective communication tool than ever.

Enhanced Capabilities

By training on a combined dataset of online text and imagery, GPT‑4o learns not only the relationship between words and visuals but also how images relate to one another. Through extensive post-training refinement, the model has developed a surprising level of visual fluency—producing consistent, context-aware, and highly functional imagery.

Text Rendering

While a picture may be worth a thousand words, sometimes a few well-placed words in an image elevate its message. GPT‑4o blends symbolic text with imagery, transforming image generation into a true medium for visual communication.

Multi-Turn Image Generation

Because image generation is built into GPT‑4o natively, you can iterate and refine your visuals through conversation. Whether you’re designing a game character or editing a concept sketch, GPT‑4o ensures visual consistency across revisions, incorporating chat history into each iteration.

Detailed Prompt Adherence

GPT‑4o excels at following complex instructions, maintaining precision even with prompts involving 10–20 objects—far more than the 5–8 object limit common to other models. Its ability to tightly link traits and relationships between elements offers users finer control over the generated results.

In-Context Learning

By analyzing user-uploaded images, GPT‑4o learns and integrates visual details directly into the image generation process—making it easier to tailor outputs to your needs.

Built-in World Knowledge

Native image generation allows GPT‑4o to seamlessly combine language understanding with visual expression, resulting in a smarter, more capable model.

Style & Photorealism

Training on a wide variety of visual styles enables GPT‑4o to convincingly create or adapt images across different artistic and photorealistic genres.

Current Limitations

While powerful, the model is not without its flaws. We’re actively working to address known limitations through ongoing improvements after launch.

Prioritizing Safety

In line with our Model Spec, we aim to balance creative freedom with robust safety. GPT‑4o supports valuable use cases such as educational content, historical illustration, and game development, while maintaining strict content policies. We continue to block generation of content that violates our standards, including explicit material and abusive deepfakes.

C2PA & Reversible Search for Transparency
Every generated image includes C2PA metadata to clearly identify it as created by GPT‑4o. We’ve also developed an internal reverse search system based on generation attributes to verify image provenance.

Proactive Content Filtering
We restrict the generation of images involving real people under heightened guidelines, especially when it comes to nudity or graphic content. Our safety systems are continuously evolving as we monitor real-world usage and improve our policies.

Reasoning-Driven Safety Mechanisms
We’ve trained a reasoning language model based on interpretable, human-written safety policies. This model helped guide development decisions and refine ambiguous policy areas. Together with existing moderation tools from ChatGPT and Sora, we now have a more robust system to enforce safety on both input prompts and generated outputs.

Availability and Access

Starting today, GPT‑4o image generation is available as the default image tool in ChatGPT for Plus, Pro, Team, and Free users. Access for Enterprise and Education users is coming soon. It’s also integrated into Sora.

For fans of DALL·E, that model remains available via its dedicated GPT.

API access to GPT‑4o’s image generation will begin rolling out to developers over the coming weeks.

Creating or customizing images is now as simple as having a chat. Describe what you need—including specific details like aspect ratio, hex color codes, or transparent backgrounds—and GPT‑4o will deliver. Just note: because these images are highly detailed, rendering can take up to a minute.