GPT-5.4 Mini and Nano: OpenAI's Models for the Subagent Era

OpenAI released GPT-5.4 Mini and GPT-5.4 Nano today, completing its three-tier GPT-5.4 lineup. The two new models are explicitly framed for a specific deployment pattern: agentic pipelines where a central orchestrator routes subtasks to cheaper, faster specialists. OpenAI calls this the “subagent era” — a shift from single-model completions toward multi-agent systems where the intelligence lives in the architecture, not in any individual model.

Both models arrive smaller, faster, and significantly cheaper than GPT-5.4, while closing most of the performance gap on the benchmarks that matter most for autonomous work.

The Models

GPT-5.4 Mini targets developers who need near-flagship performance at sub-flagship cost. It ships with a 400k token context window, supports text and image inputs plus tool use, and costs $0.75 per million input tokens and$ 4.50 per million output tokens. Latency is more than 2x faster than GPT-5 mini.

GPT-5.4 Nano is the more aggressive bet. At $0.20 per million input and$ 1.25 per million output tokens, it’s priced for high-volume classification, extraction, ranking, and coding subtasks where cost-per-call dominates every other consideration. It’s available through the API only — no consumer ChatGPT access at launch.

Model	Input ($/1M tokens)	Output ($/1M tokens)	Context	API	ChatGPT
GPT-5.4	$5.00	$30.00	1M	✓	✓
GPT-5.4 Mini	$0.75	$4.50	400k	✓	✓
GPT-5.4 Nano	$0.20	$1.25	~128k	✓	✗

Benchmark Profile

On SWE-Bench Pro, GPT-5.4’s most demanding agentic coding benchmark, Mini scores 54.4% against the flagship’s 57.7% — a gap of only 3.3 percentage points at roughly one-seventh the price. On OSWorld-Verified, which measures computer use task completion, Mini posts 72.1% versus the flagship’s 75%.

Benchmark	GPT-5.4 (Flagship)	GPT-5.4 Mini	GPT-5.4 Nano
SWE-Bench Pro	57.7%	54.4%	52.4%
OSWorld-Verified	75.0%	72.1%	—
Terminal-Bench 2.0	—	60.0%	46.3%
τ2-bench (telecom tool-calling)	—	93.4%	—

Nano’s profile reveals its intended role. The 52.4% SWE-Bench Pro score is strong for a model at this price point. The 46.3% on Terminal-Bench 2.0 signals it’s capable but not reliable enough to handle complex multi-step terminal workflows independently. The long-context performance — only 33.1% on 128k–256k needle retrieval — is the most significant constraint, making Nano unsuitable for tasks requiring extended memory across large documents.

What these numbers suggest is that Nano is designed to handle the bottom layer of an agent hierarchy: structured extraction, intent classification, data normalization, and tightly-scoped code generation. Not primary reasoning. Not extended context management. The specialized tasks that form the majority of compute in a production agentic system.

Why “Subagent Era” Framing Matters

The positioning language is worth examining carefully. OpenAI didn’t release these as cheaper versions of GPT-5.4 for users who can’t afford the flagship. They released them as architectural components — the specialist nodes in a multi-agent graph.

This framing reflects how agentic AI architecture patterns have evolved in production. Early agent implementations used a single powerful model for everything, routing all reasoning through GPT-4 or Claude 3 Opus regardless of task complexity. This was expensive and often slower than necessary. The second generation introduced orchestrator/worker splits — a capable frontier model handles planning and evaluation while cheaper models handle execution. GPT-5.4 Mini and Nano are purpose-built for the worker tier of that architecture.

The τ2-bench result — 93.4% on telecom tool-calling — suggests Mini is particularly well-suited to high-volume structured tool invocation, exactly the pattern that dominates enterprise API workflows. A workflow that routes 90% of its tool calls to Mini instead of the GPT-5.4 flagship could reduce inference costs by more than 10x while maintaining near-identical task success rates on structured subtasks.

Codex and ChatGPT Integration

Mini is available directly within Codex, OpenAI’s developer agent environment, for code-generation subtasks. Within ChatGPT, it becomes the model behind tasks that don’t require the full flagship capability — routine summarization, classification, and structured extraction that would previously consume GPT-5.4 quota at full price.

This matters for build versus buy decisions around AI infrastructure. Organizations that built their agent pipelines against GPT-5.3 mini as the cost-effective tier now have a direct upgrade path: GPT-5.4 Mini offers meaningfully better performance on coding and computer use tasks at a comparable price point.

Competitive Context

Mini and Nano enter a market that has already reorganized around this architectural pattern. Anthropic’s Claude Haiku 4.5 serves the same general role in Claude-based pipelines. Google’s Gemini 3.1 Flash-Lite, which launched earlier this month at $0.25 per million input tokens, competes directly with Nano on price while offering significantly stronger long-context performance.

The gap that GPT-5.4 Mini and Nano close most visibly is in agentic coding benchmarks. On SWE-Bench Pro, Mini’s 54.4% significantly outperforms the small model tier from either Anthropic or Google. For development teams running automated coding workflows through multi-agent pipelines, this benchmark gap translates to fewer failed subtasks and less orchestrator overhead correcting errors.

The Economics of Scale

The practical ROI impact of Mini and Nano becomes clear at production scale. Consider a pipeline making 10,000 tool calls per hour, each averaging 2k tokens in and 500 tokens out:

Routing to GPT-5.4: ~$2,600/hour
Routing to GPT-5.4 Mini: ~$380/hour
Routing to GPT-5.4 Nano: ~$105/hour

For workloads where Mini or Nano’s capability is sufficient — structured extraction, classification, routine code generation — the cost differential is the difference between an economically viable product and an expensive research project.

The more interesting question is what happens to the frontier model market as the small model tier approaches flagship performance. If Mini can handle 90% of tasks at one-seventh the cost, the pressure on the overall pricing of GPT-5.4 intensifies. OpenAI is effectively cannibalizing its own flagship’s API revenue to capture the high-volume pipeline market before competitors can establish those relationships. At scale, the volume of subagent calls will likely exceed flagship model calls by orders of magnitude — making Mini and Nano more important to OpenAI’s revenue trajectory than the headline flagship model.