NVIDIA Nemotron 3 Super: The Open Model Built for Multi-Agent Systems

NVIDIA announced Nemotron 3 Super at GTC 2026, an open-weights model that takes a different architectural approach than the pure transformer models that dominate the frontier. The 120 billion parameter model uses a hybrid Mixture-of-Experts and Mamba-Transformer architecture — only 12 billion parameters activate per forward pass — and ships with a 1 million token context window, permissive licensing, and throughput numbers that position it as NVIDIA’s answer to the question of what the next generation of open models for agentic systems should look like.

Architecture: Why Mamba Matters

The headline architectural choice is the hybrid design. Nemotron 3 Super interleaves Mamba layers — a state-space model architecture — with standard Transformer attention layers. The two serve different roles: Mamba handles memory and sequential processing with 4x better memory and compute efficiency than equivalent Transformer layers, while the Transformer layers carry the reasoning capability that requires attention across the full context.

The practical result of this combination is a model that can maintain efficient state across very long sequences without the quadratic attention cost that limits pure transformer context windows at scale. The 1M token context window isn’t just bolted on; the Mamba layers make it economically viable to run long-context inference without the hardware cost that would make it impractical for deployment outside of hyperscale data centers.

For workloads that involve extended context management — document analysis, long-session agents, codebases spanning millions of tokens — this architectural choice has real implications for deployment cost and feasibility.

Throughput and Efficiency Numbers

NVIDIA’s benchmark comparisons against comparable open models are the most concrete way to understand what the architecture delivers:

Comparison	Throughput Advantage
vs. Previous Nemotron Super	5x higher throughput
vs. GPT-OSS-120B	2.2x higher throughput
vs. Qwen3.5-122B	7.5x higher throughput (8k input / 16k output)

The Qwen comparison is particularly notable. Qwen3.5-122B has been a widely-deployed open model in multi-agent pipelines, valued for its strong benchmark performance in its weight class. A 7.5x throughput advantage at comparable parameter count represents a substantial infrastructure efficiency improvement for any team running Nemotron in place of Qwen.

On PinchBench, NVIDIA’s internal benchmark for production agent performance, Nemotron 3 Super scores 85.6%, which NVIDIA claims as the highest score for an open model in its class at time of release.

Open Weights, Permissive License

Nemotron 3 Super ships as open weights with a permissive license — meaning organizations can download, fine-tune, and deploy the model without the usage restrictions that govern proprietary API models. For teams that have committed to open-source AI models as a foundational architectural choice — whether for data privacy, cost control, or operational independence from API providers — Nemotron 3 Super is the first hybrid-architecture model in this parameter class with production-grade throughput numbers.

NVIDIA is deploying it across workstations, data center configurations, and cloud providers, which matters for the variety of deployment patterns organizations actually use. A model that can run on an on-premise workstation cluster for a team with data locality requirements, and also scale on cloud infrastructure for peak demand, is more practically useful than a model that only makes sense in one deployment environment.

Target Use Cases

NVIDIA explicitly designed Nemotron 3 Super for multi-agent applications. The three use cases highlighted in the announcement are software development (particularly code generation and review pipelines), cybersecurity triaging (analyzing logs, identifying anomalies, generating remediation steps), and agentic workflows generally.

The software development framing connects to the broader theme of this release cycle. Both Anthropic’s Claude Code Review and OpenAI’s Mini and Nano announcements this week positioned their models for automated development pipelines. Nemotron 3 Super extends that pattern to teams who need an open, self-hosted model that can run the worker tier of those pipelines.

The cybersecurity use case is worth noting separately. Security operations centers handle high-volume log analysis and incident response workflows where API latency and data residency requirements often rule out cloud-hosted models. An open model with Nemotron 3 Super’s throughput numbers that can run on-premise addresses a real gap in the open model ecosystem for security applications.

How It Fits in the Open Model Landscape

The dominant open models in production today — Llama 4 variants, Qwen3.5, Mistral models — are all pure Transformer architectures. The efficiency gains available through architectural innovation have largely been explored by smaller research models. Nemotron 3 Super is the first production-grade implementation of the hybrid Mamba-Transformer-MoE architecture in this parameter class.

Whether the architectural approach becomes a template for the next generation of open models depends on whether the efficiency and throughput advantages hold up across deployment environments and use cases beyond NVIDIA’s benchmarks. The Qwen comparison is compelling on the published numbers; developers running Nemotron in their own infrastructure will determine whether the efficiency gains translate outside the controlled benchmark environment.

For now, Nemotron 3 Super occupies a clear niche: the highest-throughput open model available in the 120B class, with a license that permits fine-tuning and commercial deployment, built explicitly for the multi-agent pipelines that are becoming the standard automation architecture pattern in enterprise software.

The GTC Context

NVIDIA has been using GTC as a product platform for years, but the AI software announcements increasingly stand alongside the hardware. The launch of Nemotron 3 Super alongside NVIDIA’s new GPU announcements signals that NVIDIA views open model software as part of its competitive position — not just a downstream consumer of its chips.

A high-throughput open model that demonstrates the capabilities of NVIDIA hardware architecture is simultaneously a research contribution, a developer tool, and a hardware sales argument. The 5x throughput improvement over the previous Nemotron Super isn’t just a capability claim; it’s a demonstration that NVIDIA’s new hardware, running NVIDIA’s software, delivers efficiency gains that matter for the practical economics of large-scale agentic deployments.