Z.ai shipped GLM 5.2 in stages. On June 13 it went live for subscribers to the company’s GLM Coding Plan — its answer to a Claude Code or Cursor subscription — with marketing copy and no benchmarks. Three days later the rest arrived at once: open weights on Hugging Face under an MIT license, a standalone API, and the first full benchmark scorecard. (Z.ai’s own blog post is dated June 17; the timezone-straddling rollout is why coverage disagrees by a day.)
The headline the company wants is “beats GPT-5.5.” The honest version is narrower and more interesting. On its own scorecard, GLM 5.2 edges GPT-5.5 on a specific class of long-horizon coding benchmark, at roughly one-sixth the API price, while trailing Claude Opus 4.8 on most coding work and losing terminal coding outright. It is not the frontier. It is the cheapest credible thing near it that you can also download — which is the same story DeepSeek V4 told in April, only this time the model is a dedicated coder.
The Architecture It Didn’t Quite Disclose
GLM 5.2 is a mixture-of-experts model, and Z.ai never restated its shape in prose at launch. From the released config and Hugging Face’s computed parameter tag, it lands at about 753 billion total parameters with roughly 40 billion active per token — the active figure inherited from the GLM-5 base it’s built on rather than separately published. You will see 744B quoted elsewhere; that is the GLM-5 base number carried forward, the same-sized model counted a slightly different way. Either way the activation rate sits near 5%, the lever that keeps inference cheap on a model this large.
The number that matters for coding is the context window: 1,048,576 tokens — a literal 2²⁰, native rather than RoPE-extended, and five times GLM 5.1’s 200K. Z.ai credits a DeepSeek-style sparse-attention scheme (it calls the trick “IndexShare,” reusing the attention indexer across blocks of layers) for making a window that large usable instead of nominal. For a model aimed at long-horizon agentic coding — where “context” means an entire repository plus a running task history — a working 1M window is the actual feature. It is the same architectural bet DeepSeek made to reach 1M context at a fraction of the compute.
Where It Actually Sits on the Benchmarks
Every score below comes from Z.ai’s own scorecard, so read them as reported rather than established. The pattern is consistent and worth stating plainly:
- SWE-bench Pro: 62.1 — ahead of GPT-5.5 (58.6), behind Opus 4.8 (69.2)
- FrontierSWE (long-horizon “Dominance”): 74.4 — edges GPT-5.5 (72.6), within a point of Opus 4.8 (75.1)
- SWE-Marathon: 13.0 — past GPT-5.5 (12.0), but Opus 4.8 doubles it at 26.0
- Terminal-Bench 2.1: 81.0 — and here it loses to both, GPT-5.5 at 84 and Opus 4.8 at 85
On knowledge and reasoning the gap widens — GPQA-Diamond at 91.2 trails the frontier by a couple of points. So “beats GPT-5.5” is true on the agentic, multi-step coding suite and false on raw terminal coding and on knowledge. That is a real and useful result, but it is a benchmark-specific one, and it does not make GLM 5.2 a frontier-beating model.
Two caveats matter more than any single row. First, these are self-reported numbers on Z.ai’s own evaluation scaffold; there is no neutral, single-harness run, and Chinese labs’ internal figures have a habit of cooling under independent testing. Second — and this cuts the other way — the one result Z.ai did not generate is the most persuasive: GLM 5.2 at its “Max” effort level sits at #2 on Code Arena, a human-preference leaderboard, and it tops the Artificial Analysis Intelligence Index among open models. The model ships with two effort tiers, High and Max, at the same per-token rate, so the difference is latency and token spend, not price.
The Pricing Is the Actual Product
API access runs $1.40 per million input tokens and $4.40 per million output, with cached input at $0.26. Set against GPT-5.5 at $5/$30, that is the “one-sixth the cost” line — roughly $5.80 versus $35 on a naive input-plus-output basis.
Treat the ratio as directional, not exact. It moves with your input/output mix, and GLM 5.2 is verbose: it spends a lot of tokens reasoning, which narrows the effective per-task gap on heavy work. But the direction is not in dispute, and it is the same direction every Chinese open-weight release has pushed — DeepSeek V4, MiniMax M2.5, Qwen 3.5. The cost floor on capable coding inference is set in China now. Western labs can price above it for the capability gap and the trust premium, but they no longer price independently of it.
MIT Weights Versus the Hosted API
The “China data risk” warning attached to GLM 5.2 is real, but it is specifically a hosted-API risk. Calling Z.ai’s endpoint sends your prompts to a Chinese provider subject to PIPL, the Data Security Law, and the National Intelligence Law; Zhipu is state-adjacent, drew a flag from China’s own cyber-security reporting center in 2025 over app data collection, and was named in a May 2026 US House inquiry into PRC-origin AI in critical infrastructure. For regulated workloads, that is a legitimate reason to keep the API at arm’s length.
What changes the calculus is that the weights are genuinely downloadable under MIT — full commercial use, no field-of-use clauses, more permissive than Gemma 4’s Apache-with-restrictions or Llama’s community license. A team with the hardware can self-host and keep the data in the building, the way the open-weight argument has always promised. The limits are real: self-hosting fixes data residency, not whatever alignment and censorship are baked into the weights, and it does not erase the export-control exposure of running a 753B model on the GPUs it demands. At roughly 1.5TB in BF16 this is a cluster-class deployment — though early reports say 2-bit quantization is “surprisingly usable” (around 241GB), because a sparse mixture spreads quantization error across experts that aren’t active on any given token.
What to Do With It
For builders, GLM 5.2 is now the rational default for price-sensitive agentic coding where you control the deployment — repo-scale refactors, long-horizon agent runs, internal dev tooling — provided you either self-host or accept the hosted-API posture for code that isn’t sensitive. For frontier-critical work the gap to Opus 4.8 is small but real, and on raw terminal coding GPT-5.5 still wins. For a wider view of the open field it now leads, our open source AI models guide tracks the leading options by task.
The more durable point is the cadence. GLM 5.2 landed within a day of Moonshot’s Kimi K2.7 Code — two flagship open-weight coders from two Chinese labs in the same week. For anyone still reaching for a closed frontier model by reflex on routine coding work, GLM 5.2 is reason to make that a decision instead of a default — provided you read the benchmark table as carefully as the company hoped you wouldn’t.
Key Details
| Spec | Detail |
|---|---|
| Lab | Z.ai (Zhipu AI) |
| Total Parameters | ~753B (MoE) |
| Active Parameters | ~40B per token |
| Context Window | 1,048,576 tokens (native) |
| Max Output | 128K tokens |
| License | MIT (open weights) |
| Input Pricing | $1.40 / 1M tokens ($0.26 cached) |
| Output Pricing | $4.40 / 1M tokens |
| Effort Levels | High, Max (same rate) |
| Availability | Z.ai API, Hugging Face (open weights) |
Sources
- GLM-5.2 model card — Hugging Face
- GLM-5.2: Built for Long-Horizon Tasks — Z.ai blog
- Z.ai API pricing
- Z.ai’s open-weights GLM-5.2 beats GPT-5.5 on long-horizon coding for 1/6th the cost — VentureBeat
- Z.ai launches GLM-5.2 with a usable 1M-token context and no benchmarks at launch — MarkTechPost
- GLM-5.2 open weights live; API use carries China data risk — TechTimes
- GLM-5.2 — Simon Willison
