Publication date: 20 March 2026
Microsoft has released MAI-Image-2, a text-to-image model developed by its AI Superintelligence team. The model ranks third on the Arena.ai text-to-image leaderboard, behind Google and OpenAI. It is now accessible through the MAI Playground and is beginning to roll out within Copilot and Bing Image Creator, with API access available to select enterprise customers.
What MAI-Image-2 does
MAI-Image-2 generates images from text prompts with a specific focus on three areas: photorealism, in-image text rendering, and complex scene generation. The photorealism component targets natural lighting, accurate skin tones, and environmental textures that the team describes as reducing post-production work. The text rendering capability is intended for use in infographics, slides, and typographic compositions where readability within the generated image matters.
The scene generation capability handles dense, multi-element compositions including surreal concepts and cinematic framing. According to Microsoft, the model was developed in close collaboration with photographers, designers, and visual storytellers to shape its handling of creative prompts.
Performance and technical context
Microsoft has not disclosed technical specifications for MAI-Image-2, including parameter counts, training data, or architecture details. The model’s ranking on the Arena.ai leaderboard places it directly below Google’s Gemini 3.1 Flash image generation and OpenAI’s GPT Image 1.5. Arena.ai rankings are based on human preference comparisons across a large number of image generation tasks, which gives the metric practical relevance for creative use cases.
No pricing information has been announced for API access beyond the current limited rollout. For end users, the MAI Playground currently imposes a 30-second cooldown between generations and a limit of 15 images before a 24-hour lockout applies. Only 1:1 aspect ratios are supported at this stage; landscape, portrait, and custom ratios are not yet available.
Comparison with earlier Microsoft image generation efforts
Microsoft’s previous image generation capability was largely dependent on third-party models integrated into Bing Image Creator and Copilot. MAI-Image-2 marks the company’s first in-house text-to-image model developed specifically under the Microsoft AI Superintelligence banner. The move signals an intent to build more of the underlying AI stack internally, rather than relying entirely on partnerships.
The Arena.ai position at third place is notable: it puts Microsoft ahead of image generation offerings from Stability AI, Midjourney, and other dedicated image generation companies. However, the gap to second and first place is not quantified in the public announcement.
Availability and access
MAI-Image-2 is currently available through the MAI Playground at microsoft.ai. Enterprise API access has been activated for select customers, with WPP listed as an early commercial partner. Broader developer access through Microsoft Foundry is described as coming soon, though no date has been provided. The model is also being integrated into Copilot and Bing Image Creator in a phased rollout, meaning general availability via those products may still take time.
The official announcement is available at microsoft.ai/news/introducing-MAI-Image-2.