NVIDIA Releases Nemotron 3 Ultra and Cosmos 3

03-06-2026

Nemotron 3 Ultra and Cosmos 3 together extend NVIDIA's AI model portfolio from inference hardware into the models themselves, offering developers open-weight alternatives for both language reasoning and multimodal Physical AI applications.

Written by:

Jorick van Weelie

Marketing Lead at DataNorth | AI Enthusiast & Tech Storyteller

nvidia releases nemotron 3 ultra (550b) and cosmos 3 omnimodel

Published: June 3, 2026

NVIDIA announced Nemotron 3 Ultra and Cosmos 3 at its Computex 2026 keynote on June 1 in Taipei. Nemotron 3 Ultra is NVIDIA’s largest open-weight language model to date, featuring approximately 550 billion total parameters with 55 billion active parameters per token and a 1 million token context window. Cosmos 3 is the world’s first fully open omnimodel, unifying language, image, video, audio, and action generation in a single architecture for robotics and physical AI applications.

What is Nemotron 3 Ultra?

Nemotron 3 Ultra is a sparse Mixture of Experts model with approximately 550 billion total parameters. Thanks to 90% sparsity, only about 55 billion parameters are active per token, which keeps inference costs manageable despite the large total parameter count. The model supports a context length of up to 1 million tokens, putting it in the same long-context category as Google’s Gemini 3.5 Flash and Alibaba’s Qwen 3.7 Max.

NVIDIA designed Nemotron 3 Ultra specifically for agentic AI workloads, meaning tasks where the model needs to plan, reason across multiple steps, and use tools autonomously. The model is fully open-weight, making it available for researchers and developers to download, fine-tune, and deploy on their own infrastructure.

Nemotron 3 Ultra benchmarks and performance

Nemotron 3 Ultra scores 48 on the Artificial Analysis Intelligence Index, placing it ahead of the next strongest US-based open-weight models. However, the model still trails the Chinese-led open-weight frontier, where models like Qwen 3.7 Max score approximately 92.4 on GPQA Diamond. On inference speed, a pre-release endpoint on DeepInfra served over 300 tokens per second.

The positioning of Nemotron 3 Ultra is notable: NVIDIA is not trying to compete with the largest closed-source frontier models from OpenAI or Anthropic on raw intelligence benchmarks. Instead, it targets the open-weight segment where developers need a capable, customizable model they can run and modify themselves, particularly for enterprise agentic applications where data sovereignty and customization matter.

What is Cosmos 3 and how does it differ from language models?

Cosmos 3 is a fundamentally different kind of model. Rather than processing and generating only text, it unifies language, image, video, audio, and action in a single Mixture-of-Transformers architecture. The design pairs an autoregressive reasoner with a diffusion generator, enabling the model to both understand and generate across all these modalities natively.

NVIDIA describes Cosmos 3 as purpose-built for Physical AI and robotics. The model can reason about visual scenes, generate images and video, process audio, and output action sequences for robotic systems, all within a single unified model. On public leaderboards, Cosmos 3 reached first place among open-weight models on both text-to-image and image-to-video benchmarks. Like Nemotron 3 Ultra, Cosmos 3 is fully open.

How do Nemotron 3 Ultra and Cosmos 3 compare to competitors?

In the open-weight language model space, Nemotron 3 Ultra competes primarily with Meta’s Llama 4 Scout, Alibaba’s Qwen 3.7 Max, and Mistral’s Devstral 2. Its 550 billion total parameter count is among the largest open-weight models available, and the 1 million token context window matches the best in its class. The key differentiator is NVIDIA’s focus on agentic AI: the model is optimized for multi-step tool use and planning, rather than general chat or creative writing.

Cosmos 3 has fewer direct competitors, as the fully open omnimodel category is still emerging. The closest comparisons are Google’s Gemini models, which also handle multiple modalities, but Gemini is closed-source. By making Cosmos 3 open, NVIDIA is betting that the robotics and Physical AI developer community will adopt it as a foundational building block for autonomous systems.

Nemotron 3 Ultra and Cosmos 3 availability

Both Nemotron 3 Ultra and Cosmos 3 are available as open-weight models. Nemotron 3 Ultra is accessible through NVIDIA’s developer portal and is already served on inference platforms like DeepInfra. NVIDIA has positioned both models as part of its broader strategy to become an end-to-end AI intelligence infrastructure provider, moving beyond its traditional role as a chip vendor.

The official Nemotron 3 family announcement is available on the NVIDIA Newsroom, and technical details can be found on the NVIDIA Research page.