Kimi K2.7 Code: Moonshot Cuts Thinking Tokens 30%, But the Benchmarks Are All Its Own

Moonshot AI put Kimi K2.7 Code on Hugging Face on June 12 — one day before Z.ai’s GLM 5.2 — and the framing was unusual for a model launch. The pitch is not “smarter.” It is “cheaper to think.” K2.7 Code is a coding-specialized refresh of April’s K2.6, and Moonshot’s headline claim is that it uses roughly 30% fewer “thinking” tokens than K2.6 while scoring higher on the company’s own coding benchmarks. In long agentic runs, where reasoning tokens bill as output, that is a cost story as much as a capability one.

The problem is that almost everything we can say about its capability comes from Moonshot itself.

What Shipped

K2.7 Code is a variant, not a new generation. It keeps the shape of the Kimi K2.5 line: a trillion-parameter mixture of experts with 32 billion active per token, 384 experts (eight selected plus a shared one), a 256K-token context window, and native INT4 weights. The model ID is kimi-k2.7-code, and it ships under a “Modified MIT” license.

Two design choices stand out, and both push the same way. Thinking is mandatory — you cannot disable reasoning through the API, and it errors if you try — and sampling is fixed at temperature 1.0, top-p 0.95. The model always reasons, and it keeps its full chain across multi-turn conversations. That makes the 30% token cut load-bearing: you can’t claw the cost back by switching reasoning off, so the savings have to come from the model emitting less of it in the first place. Pricing is $0.95 per million input tokens and $4.00 per million output, cached input at $0.19 — input and output unchanged from K2.6. A “HighSpeed” tier runs around 180 tokens per second at double the price, though it shipped resource-limited at launch.

Why 30% Fewer Tokens Is the Story

In an agentic coding loop — read files, plan, edit, run tests, repeat — the model’s reasoning tokens dominate the bill, because every step reasons before it acts and the chain compounds across turns. A 30% cut on thinking is therefore close to a 30% cut on the dominant cost line, if the claim holds in your workload. Stacked on the base price, that efficiency is what Moonshot is actually selling. Against Claude Opus 4.8 at roughly $5/$25 and GPT-5.5 at $5/$30, K2.7 Code is already four-to-six times cheaper per output token before the efficiency gain — the same wedge the whole field of coding agents is now being priced against.

One number to retire while you’re here: the “~88% cheaper coding” figure circulating with this release is K2.6’s April comparison, not K2.7’s. Don’t credit it to this model.

The Benchmarks Are All First-Party

This is the honest core of the launch. Moonshot reports gains on Kimi Code Bench v2 (62.0, up from K2.6’s 50.9), Program Bench, MLS Bench Lite, MCP Atlas, and MCP Mark Verified. The names are the tell — these are Moonshot’s internal suites, not SWE-bench, Terminal-Bench, or LiveCodeBench.

And on those same in-house tables, K2.7 Code mostly loses to the frontier it’s being compared against. Kimi Code Bench v2 puts it at 62.0 against GPT-5.5’s 69.0 and Opus 4.8’s 67.4. The one row the launch highlights is MCP Mark Verified, an agentic tool-use test, where K2.7 beats Opus 4.8 (76.4) at 81.1 — but still trails GPT-5.5 at 92.9. A win you have to qualify twice is closer to a draw.

As of mid-June there is no independent result for K2.7 Code on any public leaderboard — no SWE-bench Verified or Pro, no Terminal-Bench, no LiveCodeBench. Moonshot also declined to submit to DeepSWE, the independent benchmark practitioners specifically asked for, and that omission has context: K2.6 had scored just 24% on DeepSWE, far below its self-reported coding numbers. The general rule benchmark watchers keep repeating applies here in full — every model improves by double digits on its own test suite. That asymmetry is the real story. GLM 5.2 at least has a human-preference Code Arena ranking and an Artificial Analysis score corroborating its self-report; Kimi K2.7 Code has neither.

What the Few Independent Testers Found

The handful of independent reads run mixed-to-skeptical. Elliot Arledge’s KernelBench-Hard evaluation concluded that K2.7 is “more honest, not more capable”: it now writes genuine Triton kernels where K2.6 leaned on library wrappers — a real behavior change — but two of its new kernels were buggy and one mixture-of-experts kernel actually regressed. Some of the “double-digit improvement,” in other words, is the model doing the work more legibly rather than better. A two-phase head-to-head from the Kilo team put K2.7 Code behind GLM 5.2 on both planning and build quality, and further behind Claude. VentureBeat’s summary was blunt: the benchmarks don’t check out.

Notice what the critics did not dispute. The 30% token-reduction claim is narrower and more measurable than the capability claims, and it is the part of the pitch that survived contact with outside testing. The capability story is the part to discount.

The License Fine Print

“Modified MIT” is plain MIT until your product crosses 100 million monthly active users or $20M per month in revenue, at which point you must display “Kimi K2.7 Code” prominently in your product’s interface. For essentially everyone that is functionally MIT; for a handful of would-be hyperscale deployers it is an attribution tax. It is more permissive than Llama’s community license and sits squarely in the same genuinely-open territory as GLM 5.2’s plain MIT and DeepSeek’s — part of why Chinese labs, not Western ones, now set the open-weight pace. For the full field, see our open-source AI models guide.

The Verdict

Kimi K2.7 Code is a credible, cheap, open-weight coding model whose efficiency claim is probably real and whose capability claims are, for now, vouched for by no one but Moonshot. If you are cost-optimizing an agentic coding pipeline and can run your own evaluations, it earns a slot on the shortlist — next to GLM 5.2, which arrived a day later with better independent evidence behind it. If you can’t run your own evals, wait for the public leaderboards before you trust the launch chart. The token efficiency is the credible part of this release; the benchmarks are a promissory note.

Key Details

Spec	Detail
Lab	Moonshot AI
Model ID	`kimi-k2.7-code`
Total Parameters	1 trillion (MoE)
Active Parameters	32B per token
Context Window	256K tokens
License	Modified MIT
Input Pricing	$0.95 / 1M tokens ($0.19 cached)
Output Pricing	$4.00 / 1M tokens
Reasoning	Always on (cannot be disabled)
Availability	Moonshot API, Hugging Face (open weights)