NVIDIA launches Nemotron 3 Super, a 120B open-weight hybrid MoE for agentic AI

On March 11, 2026, NVIDIA said it released Nemotron 3 Super, an open-weight 120B model (12B active) built as a hybrid Mamba-Transformer Mixture-of-Experts system for agentic AI. The company says the model supports a native 1M-token context window and targets long-horizon, multi-agent workflows such as software development and security triage. NVIDIA says weights, datasets, and recipes are open and the model is available through build.nvidia.com and Hugging Face.

NVIDIA’s developer blog outlines the architecture with LatentMoE routing and multi-token prediction to improve throughput while keeping active parameters at 12B during inference. The release positions the model as a successor to Nemotron Super and claims more than 5x throughput versus the prior generation, though those metrics are vendor-reported. NVIDIA also published a technical report and a research page detailing training and checkpoint variants.

Independent testing is still emerging, but Artificial Analysis said it evaluated Nemotron 3 Super and found higher intelligence for its openness tier, with roughly 10% higher throughput per GPU than gpt-oss-120b in its load test. It said the load test used 50k input tokens and 2k output tokens to approximate document processing or code analysis workloads. The firm also highlighted the model’s openness disclosures around training data and methodology, which it says are stronger than most peers in the same size class.

IT之家 reported the launch and echoed NVIDIA’s emphasis on open weights and long-context agentic use cases, underscoring that this release is part of NVIDIA’s broader push into open-model ecosystems. The real adoption test will be whether licensing terms, hardware requirements, and third-party benchmarks make the model practical outside NVIDIA-optimized stacks.

Sources: