Two things happened to the AI frontier in 2026, and they point in opposite directions. One set of labs started locking their most capable models behind vetted-partner programs — no public API, no signup, no credit card. Another set started giving their models away under MIT licenses, weights and all. Both trends are accelerating, and together they break the question most buyers still ask. “What’s the best model” is splitting into two questions with different answers: the best model you can buy, and the best model you can run.
This is not the usual story about benchmarks creeping upward. It’s a fork in how frontier capability gets distributed at all.
The Gated-Specialist Pattern
Inside ten days this spring, three frontier releases ran the same playbook. On April 7, Anthropic announced Claude Mythos and Project Glasswing — a model so good at finding software vulnerabilities (a 27-year-old bug in OpenBSD, a 17-year-old remote-code-execution flaw in FreeBSD’s NFS server) that the company decided not to release it publicly at all, gating it instead to a few dozen vetted critical-infrastructure organizations. A week later, OpenAI shipped GPT-5.4-Cyber, a cyber-permissive variant with binary reverse-engineering capability, distributed only through its Trusted Access for Cyber program. Three days after that, OpenAI launched GPT-Rosalind, a life-sciences reasoning model handed to a short list of vetted labs — Amgen, Moderna, Thermo Fisher, the Allen Institute — and nobody else.
Strip away the domains and the template is identical: build a capability-dense model, often tuned to a single high-stakes field, then release it to an accountable partner list under contractual use restrictions, with no public surface. The driver each lab cited was dual-use risk, and the logic holds. The reasoning that lets a model find a zero-day also lets it write one. The reasoning that helps design a therapy can reason about a pathogen. Publishing those capabilities to an open endpoint hands the same uplift to every defender and every attacker simultaneously, so the labs stopped publishing them.
It would be naive to read this as pure safety, though. A gated program is also a moat and a high-ACV enterprise channel. Mythos is priced at $25 / $125 per million input/output tokens. That is not a number set for experimentation. It’s priced for organizations with budgets and compliance departments. The safety rationale is real, and the business model rides comfortably alongside it. Both things are true.
The Open-Weight Counter-Current
Now look at the other half of the year. The cheapest serious models on the market are the ones you can download and keep. DeepSeek V4 shipped 1.6 trillion parameters and a million-token context window under an open license at a fraction of GPT-5.5’s price. GLM 5.2 matches or edges GPT-5.5 on agentic coding at $1.40 / $4.40 per million tokens, MIT-licensed, weights published. Kimi K2.7 Code undercuts the field again at $0.95 / $4.00, also open.
The strategy here is the exact inverse of gating. Maximize distribution, compete on cost, and treat the capability as already loose in the world — because for general coding and reasoning, it is. You cannot put DeepSeek V4 back in the box; the weights are already on disks all over the world. So the labs releasing this way don’t try. They price for volume and let the market route around the expensive closed APIs.
Two Answers to the Same Question
The split isn’t random. Both camps are answering one question — what do you do with a model too capable to hand out carelessly? — and they’ve reached opposite conclusions, mostly because they’re talking about different capabilities.
| Gated specialist | Open generalist | |
|---|---|---|
| Examples | GPT-Rosalind, GPT-5.4-Cyber, Claude Mythos | GLM 5.2, Kimi K2.7, DeepSeek V4 |
| How you get it | Apply, qualify, sign | Download the weights |
| Distribution | Restricted by design | Maximized by design |
| Pricing | Enterprise (Mythos $25 / $125) | Floor (Kimi $0.95 / $4.00) |
| Where your data goes | Held by the vendor under contract | Stays on your infrastructure |
| The underlying bet | This capability is dangerous; control who holds it | This capability is unstoppable; compete on cost |
The dividing line is the nature of the capability, not the lab’s philosophy. Bioweapon-adjacent reasoning and industrial-grade vulnerability discovery are genuinely dangerous in a way that fast, cheap code generation is not. It is coherent for the same industry to gate a drug-discovery model and open-source a coding model in the same quarter, and that’s roughly what happened. The open-weight wave is also, not incidentally, led by Chinese labs that have made cost the entire battlefield — undercutting the closed frontier is a competitive strategy as much as an ideological one.
What This Means If You’re Buying Vertical AI
For most businesses, the practical consequence is blunt: you will not get GPT-Rosalind or Mythos. Those programs are not procurable. They require lab capacity, a credentialed security team, regulatory accountability, and a contract — qualifications that screen out everyone they’re meant to screen out. If your vertical-AI plan depends on frontier-grade capability in a gated domain, the model is not the deliverable you can buy. The partnership is, and you have to qualify for it.
That reframes the decision along three axes that didn’t matter much when every model was a public API away.
Access is now a gate, not a price. The first question about a specialized model is no longer “can I afford it” but “can I get it at all.” For genuinely high-stakes work, plan for an application process and a procurement timeline, not a self-serve API key.
Compliance cuts both ways. A gated program comes with the vendor’s trust-and-safety apparatus, contractual use restrictions, and — critically — your data flowing through someone else’s infrastructure. An open-weight model you self-host gives you the opposite tradeoff: full data residency and control, and full ownership of the risk that comes with running a capable model yourself. Neither is automatically the safer choice; it depends on what you’re handling and who’s asking. This is exactly the territory an AI governance checklist is built to walk you through before you sign anything.
Build-versus-buy gets sharper, not blurrier. The open-weight tier has made “build” cheaper and more credible than it was a year ago — you can self-host a near-frontier coder for the cost of the hardware. The gated tier has made “buy” the only option for certain capabilities. The middle, where you used to default to a general public API, is where the real evaluation now lives. The build-vs-buy framework still applies; the inputs have just moved.
The Buyer’s Real Question
For two years the sensible default was to pick the strongest general model and route everything through it. That default is breaking. The frontier no longer resolves to a single leaderboard, because capability and access strategy have come apart — the most capable model for your problem might be one you can only reach through a vetted program, or one you have to host yourself, or one whose price floor makes the closed APIs hard to justify.
So the question worth asking isn’t which model tops the benchmark this month. It’s which access regime fits the workload in front of you: the risk it carries, the data it touches, and whether you’re trying to buy a capability or own one. Get that framing right and the model selection mostly follows. Get it wrong and you’ll spend a procurement cycle trying to buy something that was never for sale, or self-hosting something you should have rented.
