Microsoft releases MAI-Image-2: text-to-image model reaches third place on global arena leaderboard

20-03-2026

Microsoft has released MAI-Image-2, an in-house text-to-image model that currently ranks third on the Arena.ai leaderboard for its superior photorealism and text rendering. Developed by the Microsoft AI Superintelligence team, it is now rolling out via the MAI Playground and Copilot as a high-performance competitor to Google and OpenAI.

Written by:

Carlein Polinder

microsoft logo Sign up for our Newsletter

Publication date: 20 March 2026

Microsoft has released MAI-Image-2, a text-to-image model developed by its AI Superintelligence team. The model ranks third on the Arena.ai text-to-image leaderboard, behind Google and OpenAI. It is now accessible through the MAI Playground and is beginning to roll out within Copilot and Bing Image Creator, with API access available to select enterprise customers.

What MAI-Image-2 does

MAI-Image-2 generates images from text prompts with a specific focus on three areas: photorealism, in-image text rendering, and complex scene generation. The photorealism component targets natural lighting, accurate skin tones, and environmental textures that the team describes as reducing post-production work. The text rendering capability is intended for use in infographics, slides, and typographic compositions where readability within the generated image matters.

The scene generation capability handles dense, multi-element compositions including surreal concepts and cinematic framing. According to Microsoft, the model was developed in close collaboration with photographers, designers, and visual storytellers to shape its handling of creative prompts.

Performance and technical context

Microsoft has not disclosed technical specifications for MAI-Image-2, including parameter counts, training data, or architecture details. The model’s ranking on the Arena.ai leaderboard places it directly below Google’s Gemini 3.1 Flash image generation and OpenAI’s GPT Image 1.5. Arena.ai rankings are based on human preference comparisons across a large number of image generation tasks, which gives the metric practical relevance for creative use cases.

No pricing information has been announced for API access beyond the current limited rollout. For end users, the MAI Playground currently imposes a 30-second cooldown between generations and a limit of 15 images before a 24-hour lockout applies. Only 1:1 aspect ratios are supported at this stage; landscape, portrait, and custom ratios are not yet available.

Comparison with earlier Microsoft image generation efforts

Microsoft’s previous image generation capability was largely dependent on third-party models integrated into Bing Image Creator and Copilot. MAI-Image-2 marks the company’s first in-house text-to-image model developed specifically under the Microsoft AI Superintelligence banner. The move signals an intent to build more of the underlying AI stack internally, rather than relying entirely on partnerships.

The Arena.ai position at third place is notable: it puts Microsoft ahead of image generation offerings from Stability AI, Midjourney, and other dedicated image generation companies. However, the gap to second and first place is not quantified in the public announcement.

Availability and access

MAI-Image-2 is currently available through the MAI Playground at microsoft.ai. Enterprise API access has been activated for select customers, with WPP listed as an early commercial partner. Broader developer access through Microsoft Foundry is described as coming soon, though no date has been provided. The model is also being integrated into Copilot and Bing Image Creator in a phased rollout, meaning general availability via those products may still take time.

The official announcement is available at microsoft.ai/news/introducing-MAI-Image-2.