June 2, 2026
MiniMax, the Shanghai-based AI lab, released its flagship model MiniMax M3 on June 1, 2026. MiniMax M3 is the first open-weight model to combine frontier-level coding performance, a 1-million-token context window, and native multimodal capabilities (image and video understanding) in a single architecture. The model scores 59.0% on SWE-Bench Pro, surpassing both OpenAI GPT-5.5 and Google Gemini 3.1 Pro on this widely used software engineering benchmark.
What can MiniMax M3 do?
MiniMax M3 is built for long-horizon, complex coding and agentic tasks. The model processes up to 1 million tokens of context at once, which is five times more than its predecessor MiniMax M2.7. This allows M3 to work across entire codebases, multi-document research pipelines, and long-running agent sessions without losing track of earlier information.
Beyond coding, MiniMax M3 natively understands images and video, making it a multimodal model rather than a text-only system. MiniMax demonstrated three long-horizon tasks at launch: autonomous reproduction of an ICLR 2025 research paper in 12 hours (producing 18 commits and 23 figures), a 24-hour CUDA kernel optimization run that improved FP8 hardware utilization from 7.6% to 71.3% (a 9.4x speedup across 147 benchmark submissions), and a model-training task where M3 scored 0.37 on PostTrainBench by training another model end to end.
MiniMax M3 benchmarks and technical specs
MiniMax M3 scores 59.0% on SWE-Bench Pro, a benchmark that measures real-world software engineering fixes. This result places M3 above OpenAI GPT-5.5 and Google Gemini 3.1 Pro, and approaches Anthropic Claude Opus 4.7 on the same test. On Terminal-Bench 2.1 (command-line agent tasks), M3 scores 66.0%. On MCP Atlas, a tool-use benchmark, it reaches 74.2%. On BrowseComp, a web search and browsing benchmark, MiniMax M3 scores 83.5, which surpasses Claude Opus 4.7’s score of 79.3.
The core architectural innovation in MiniMax M3 is MiniMax Sparse Attention (MSA). This design uses a lightweight index branch to scan incoming tokens and select which blocks of past tokens require attention, then runs attention only on those relevant blocks. At 1-million-token context length, MSA reduces per-token compute to one-twentieth of the prior generation, delivers more than 9x faster prefill (processing the input), and more than 15x faster decoding (generating output). Output speed runs at approximately 100 tokens per second, roughly 3x faster than Claude Opus. MiniMax has not disclosed the total parameter count of M3.
It is worth noting that several benchmark results were obtained on MiniMax’s own infrastructure using agent scaffolding such as Claude Code and Mini-SWE-Agent. Independent third-party verification is still pending, and M3 has not yet appeared on the DeepSWE board for long-horizon software tasks.
How does MiniMax M3 compare to GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro?
On SWE-Bench Pro, MiniMax M3 (59.0%) outperforms OpenAI GPT-5.5 and Google Gemini 3.1 Pro, though it still trails Anthropic Claude Opus 4.7 by a narrow margin.
On BrowseComp web search, M3 (83.5) surpasses Claude Opus 4.7 (79.3).
The cost gap is substantial: MiniMax M3 API input pricing starts at approximately $0.30 per million tokens, while Claude Opus 4.7 charges $5.00 per million input tokens and $25.00 per million output tokens. That makes M3 more than 15x cheaper on input.
Within the Chinese open-weight model ecosystem, MiniMax M3 competes directly with DeepSeek V4 and Alibaba Qwen3.7-Max. All three target similar agentic coding use cases and offer open weights. MiniMax’s differentiator is the combination of frontier coding, 1M-token context, and native multimodality in a single model. The MSA architecture is what makes the 1M-token context window practical for production workloads rather than just a specification on paper, by cutting inference costs to a fraction of what full-attention models require at that scale.
MiniMax M3 availability and pricing
MiniMax M3 is available now through the MiniMax API, MiniMax Code (the company’s agent product), and monthly token plan subscriptions.
API pricing is set at approximately $0.60 per million input tokens and $2.40 per million output tokens at standard rates, with a 50% launch discount for the first week ($0.30 input / $1.20 output).
With cache optimization, the blended cost drops to roughly $0.06 per million tokens.
Monthly token plans are available at $20 (Plus, approximately 1.7 billion tokens), $50 (Max, approximately 5.1 billion tokens), and $120 (Ultra, approximately 9.8 billion tokens).
Open weights and a full technical report are expected on Hugging Face and GitHub within approximately ten days of launch. The licensing terms have not been published yet. MiniMax’s previous model M2.7 shipped under a license that restricted commercial use without prior written authorization, so M3 may follow a similar approach. MiniMax is publicly listed on the Hong Kong Stock Exchange since January 2026 and is preparing for a secondary listing on Shanghai’s Star Market.
What does MiniMax M3 mean for the AI model landscape?
MiniMax M3 represents a notable shift in two areas. First, it demonstrates that sparse attention can work at production scale for long-context models. MiniMax itself abandoned sparse attention during its entire M2 generation in favor of full attention, calling the infrastructure “not yet mature” at the time. Returning to sparse attention with MSA and achieving order-of-magnitude speedups suggests the technology has caught up. Anthropic, Google DeepMind, and OpenAI all have efficient-attention research underway, but none have shipped a flagship model with comparable public efficiency commitments at 1M-token context.
Second, M3 continues to widen the cost gap between Chinese open-weight models and Western proprietary alternatives. Following the pricing pressure from DeepSeek and Qwen, MiniMax now offers frontier-competitive coding performance at a fraction of the cost of Claude Opus or GPT-5.5. For developers evaluating coding agents and long-context workloads, M3 is worth testing immediately through the API, with a more definitive assessment possible once independent benchmarks and the open weights become available.
For full details on MiniMax M3, including the architecture overview and benchmark methodology, see the official announcement on the MiniMax blog.