NVIDIA Launches Cosmos 3

01-06-2026

Cosmos 3 is the first fully open omnimodel that unifies physical AI reasoning, world simulation, and action generation in a single architecture.

Written by:

Jorick van Weelie

Marketing Lead at DataNorth | AI Enthusiast & Tech Storyteller

nvidia launches cosmos 3 open physical ai omnimodel

June 1, 2026

NVIDIA launched Cosmos 3 at GTC Taipei on May 31, 2026, an open world foundation model for physical AI that combines vision reasoning, world generation, and action prediction in a single system. Cosmos 3 is the first fully open omnimodel that can natively understand and generate text, images, video, ambient sound, and actions, trained on 20 trillion tokens of multimodal data. The model is available in two sizes (Super at 32B parameters and Nano at 8B parameters) and targets robotics, autonomous vehicles, and vision AI agent developers who previously had to juggle multiple separate models for these tasks.

What is NVIDIA Cosmos 3 and what does it do?

Cosmos 3 represents a fundamental shift in how physical AI systems are built. Where previous versions of the Cosmos platform required developers to work with separate models for world generation (Cosmos Predict), controlled generation (Cosmos Transfer), scene understanding (Cosmos Reason), and policy generation (Cosmos Policy), Cosmos 3 unifies all of these capabilities into a single omnimodel.

The model is built on a new Mixture-of-Transformers (MoT) architecture consisting of two towers. The Reasoner tower is a vision-language model that interprets multimodal observations (images, videos, text) using an autoregressive architecture to understand motion, object interactions, and spatial-temporal relationships. The Generator tower uses a diffusion-based process to produce physics-aware video and action outputs conditioned on the Reasoner’s understanding. In practical terms, Cosmos 3 can serve as a vision language model that reasons across modalities, a world model that simulates physical environments and predicts future states, or the backbone for world action models that train robots to perform specific tasks.

NVIDIA Cosmos 3 benchmarks and technical specifications

NVIDIA releases Cosmos 3 in two model sizes. Cosmos 3 Super is the 32B parameter variant (32B reasoner + 32B generator) designed for datacenter deployment on NVIDIA Hopper and Blackwell GPUs, targeting large-scale synthetic data generation and advanced physical reasoning workloads. Cosmos 3 Nano is the 8B parameter variant (8B reasoner + 8B generator) optimized for efficient inference on workstation-grade hardware such as the NVIDIA RTX PRO 6000 GPU. A third variant, Cosmos 3 Edge, is coming soon for real-time inference at the edge.

NVIDIA trained Cosmos 3 on 20 trillion tokens of multimodal data, including nearly a billion images, 400 million real and synthetic videos, ambient audio, text, and action data from humans and robots. Among open models, Cosmos 3 ranks first on Artificial Analysis, Physics-IQ, PAI-Bench, and R-Bench for world generation accuracy, first on RoboLab and RoboArena for action policy quality, and first on VANTAGE-Bench and TAR for vision understanding. Cosmos 3 Super leads at the 32B tier on VANTAGE-Bench, while Cosmos 3 Nano leads at the 8B tier.

How does Cosmos 3 compare to previous Cosmos models?

The most significant change from Cosmos 2.5 to Cosmos 3 is architectural. Earlier Cosmos releases consisted of separate specialized models: Cosmos Predict for world simulation, Cosmos Transfer for controlled generation, Cosmos Reason for scene understanding, and Cosmos Policy for robot action generation. Developers had to manage multiple models and inference pipelines, which added complexity and latency to physical AI workflows.

Cosmos 3 replaces that fragmented approach with a single unified model that handles reasoning, generation, and action prediction together. According to NVIDIA, this reduces physical AI training and evaluation cycles from months to days. The Cosmos platform now also includes new datasets for robotics, physics, human motion, autonomous driving, warehouse safety, and spatial reasoning, as well as new agent skills for neural scene reconstruction, defect-image generation, and video augmentation.

Cosmos Coalition and early adopters

Alongside the model launch, NVIDIA announced the Cosmos Coalition, a collaboration between world model builders and AI developers including Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI. Coalition members contribute models, research, and evaluation techniques while using Cosmos 3 technologies and NVIDIA DGX Cloud infrastructure for large-scale training.

Physical AI developers already building on the Cosmos platform include Agile Robots, Doosan Robotics, LG Electronics, Samsung, and Skild AI for robotics applications; Li Auto for autonomous vehicles; and Centific, Fogsphere, Linker Vision, Milestone Systems, and Yuan for vision AI agents in industrial and smart spaces applications.

NVIDIA Cosmos 3 availability and licensing

Cosmos 3 Super and Cosmos 3 Nano are available immediately. Developers can try Cosmos 3 on build.nvidia.com, download open model weights from Hugging Face, and customize models using Hugging Face Diffusers and resources on GitHub. The models can also be deployed as NVIDIA NIM microservices. Cloud infrastructure partners include Baseten, CoreWeave, Microsoft Azure, Nebius, Deep Infra, and Classmethod.

Cosmos 3 is released under the NVIDIA Open Model License, which permits commercial use and allows developers to create and distribute derivative models. NVIDIA does not claim ownership of outputs generated using Cosmos 3 or its derivatives. Training scripts, deployment tools, and the datasets that trained Cosmos 3 are all open-sourced on GitHub.NVIDIA Cosmos 3 was announced by Jensen Huang during his keynote at GTC Taipei as part of Computex 2026.

The full announcement and technical documentation are available on the NVIDIA Newsroom and NVIDIA Cosmos developer page.