Xiaomi releases MiMo-V2-Pro, MiMo-V2-Omni and MiMo-V2-TTS

19-03-2026

Xiaomi has launched three in-house foundation models under the MiMo-V2 banner. In this article we explain the models, the details of them and what they mean.

Written by:

Jorick van Weelie

Marketing Lead at DataNorth | AI Enthusiast & Tech Storyteller

xiaomi launches mimo v2 pro omni and tts

19 March 2026

Xiaomi has launched three in-house foundation models under the MiMo-V2 banner:
MiMo-V2-Pro, a large language model with over one trillion total parameters;
MiMo-V2-Omni, a full-modality model handling text, vision and audio;
and MiMo-V2-TTS, a speech synthesis model trained on hundreds of millions of hours of audio data.
The launch also confirmed that Hunter Alpha, a mystery model that had been available on the developer platform OpenRouter since 11 March without any attribution, was in fact an early version of MiMo-V2-Pro.

The announcement marks Xiaomi’s most ambitious move yet in foundation model development, positioning the company alongside established AI labs with a full-stack model suite covering language, multimodal reasoning and voice generation.

Three models, three distinct functions

MiMo-V2-Pro is designed for what Xiaomi calls the “agent era”, complex, multi-step tasks that require sustained reasoning, long-context retention and reliable code execution. With a 1-million-token context window and 42 billion active parameters out of more than one trillion total, the model can process entire codebases or large document sets in a single session. It uses a Hybrid Attention mechanism with a 7:1 ratio, an increase from the 5:1 ratio used in its predecessor.

MiMo-V2-Omni takes a different direction. It processes text, images, video and audio in a unified architecture, targeting use cases where multiple input types occur simultaneously. Xiaomi reports strong scores in audio understanding and video event forecasting benchmarks, with an MMAU-Pro score of 69.4 and a FutureOmni score of 66.7.

MiMo-V2-TTS is a speech synthesis model built on a self-developed Audio Tokenizer with multi-codebook joint modeling. It supports adjustable tone, emotion and speaking style, covers multiple Chinese dialects and can handle both conversational speech and singing.

Technical specifications and benchmark performance

MiMo-V2-Pro uses Mixture-of-Experts architecture with 1 trillion-plus total parameters and 42 billion active parameters per token, making it roughly three times larger in total scale than MiMo-V2-Flash while keeping inference costs manageable. API pricing is set at $1 per million input tokens and $3 per million output tokens at up to 256K context, rising to $2/$6 for the full 1-million-token range.

On the Artificial Analysis Intelligence Index, MiMo-V2-Pro ranks 8th globally and 2nd among Chinese language models. On ClawEval, an agent benchmark, it scored 61.5, close to Claude Opus 4.6 at 66.3 and notably ahead of GPT-5.2 at 50.0. On Terminal-Bench 2.0, which tests coding in real terminal environments, it scored 86.7.

MiMo-V2-Omni is priced at $0.40 per million input tokens and $2.00 per million output tokens. Its multimodal benchmark results place it ahead of Gemini 3 Pro and Claude Opus 4.6 on audio and video tasks, according to Xiaomi’s published comparisons.

The Hunter Alpha mystery and what it reveals about Xiaomi’s strategy

Hunter Alpha appeared on OpenRouter on 11 March without any developer attribution and quickly attracted attention from developers after topping several agent benchmarks. Speculation ranged from an unreleased DeepSeek V4 to a model from a previously unknown lab. Xiaomi confirmed on 19 March that Hunter Alpha was an early, anonymized deployment of MiMo-V2-Pro.

The tactic gave Xiaomi unfiltered developer feedback before the public launch, while the strong benchmark performance generated organic attention in AI communities. The approach is unusual for a hardware-first company entering the foundation model space.

Positioning against established models

MiMo-V2-Pro enters a competitive field where OpenAI’s GPT-5.4 currently holds the top benchmark positions and Anthropic’s Claude Opus 4.6 leads on several agent tasks. At $1/$3 per million tokens, MiMo-V2-Pro is priced below both, which may appeal to developers and enterprises running high-volume workloads.

For Chinese-language tasks and enterprise applications where data residency matters, MiMo-V2-Pro’s Chinese-language performance, ranking second among Chinese LLMs, gives it a clear differentiator over primarily English-optimized models.

Real-world use cases and availability

Xiaomi has positioned MiMo-V2-Pro primarily for software development and agentic workflows, where its long context window and high Terminal-Bench scores make it practical for automated code review, repository analysis and multi-step task automation. MiMo-V2-Omni targets use cases such as video content analysis, voice-driven interfaces and customer service applications that involve mixed media inputs.

MiMo-V2-TTS is aimed at consumer products, in-car AI assistants and content creation tools where nuanced voice output matters. All three models are available via API and through a web interface. Xiaomi has indicated that the models will also power features in its own hardware ecosystem, which spans smartphones, electric vehicles and smart home devices.

Full technical documentation and API access for all three models are available on Xiaomi’s MiMo model page.