Back to Insights
AI

Apple WWDC 2026: Siri Runs on Gemini Now, and Apple Is Fine With That

Apple's rebuilt Siri runs on a custom 1.2T-parameter Google Gemini model at ~$1B/year — in Tim Cook's last keynote, Apple conceded the model race.

S5 Labs Team June 8, 2026

Apple opened WWDC 2026 this morning by conceding the one thing it spent two years insisting it would not. The rebuilt assistant — now branded Siri AI — does not get its reasoning from an Apple model. The heavy lifting runs on a custom 1.2-trillion-parameter version of Google’s Gemini, licensed for a reported $1 billion a year. The keynote was also Tim Cook’s last as chief executive before he hands the role to hardware chief John Ternus on September 1, which gave the moment a valedictory weight it might not otherwise have carried. The most consequential thing Cook announced on his way out was that Apple had given up trying to build the brain of its own assistant.

The framing onstage was about Siri finally working. The story underneath is about what Apple decided it is willing to outsource, and the answer reorders how to think about the company’s AI position.

The deal

The model powering Siri is a Gemini variant Google built to Apple’s specification — 1.2 trillion parameters, running on Nvidia hardware inside Google Cloud. The reported terms are roughly $1 billion a year, with Deepwater’s Gene Munster pegging the multi-year value as high as $5 billion. Critically, Apple’s contract bars Google from using Apple users’ Siri queries to train future Gemini models, which is the clause that lets Apple keep its privacy posture while running a competitor’s model.

That last point is the whole negotiation in miniature. Apple is paying a premium and writing a restrictive contract precisely so it can rent capability without surrendering the thing it actually sells, which is the promise that your data is not the product. Whether buyers read “Gemini-powered Siri” as Apple’s privacy or Google’s surveillance is the perception battle Apple just signed up to fight.

The routing is the privacy argument

Siri now decides where each request goes through three tiers. Simple queries are handled on-device by Apple’s own smaller models, for speed and because they never leave the phone. Mid-weight tasks run through Apple’s Private Cloud Compute, the company’s own confidential-computing infrastructure. Only the heaviest queries — the ones that actually need a frontier model’s reasoning — get routed out to the 1.2-trillion-parameter Gemini.

This is the same insight Google built its own I/O keynote around three weeks ago when it moved the Gemini app to compute-based metering: not every request deserves a frontier model, and routing by difficulty is how you keep the economics and the latency sane. Apple is applying it to privacy as much as cost. The architecture means most of what you ask Siri never touches Google’s infrastructure at all, and Apple can say — accurately — that Gemini only sees the fraction of queries that genuinely require it.

What Siri can finally do

The capability that justifies all of this is the set of features Apple first promised at WWDC 2024 and then quietly delayed for two years because its own models could not deliver them reliably. Personal context: Siri can now reference information across your apps and history. On-screen awareness: it can understand and act on what is currently displayed. App action execution: it can carry out multi-step tasks across multiple apps on your behalf.

Two changes are new rather than delayed. Siri gets a standalone app for the first time — a chat thread you type or talk to, with conversation history synced across your devices, much the way you already use ChatGPT or Gemini — and it now surfaces inside the Dynamic Island. The app is the quiet admission underneath the demos: Apple has stopped treating Siri as a voice shortcut wired into the OS and started treating it as a destination, which is the shape its competitors settled on years ago.

These are the table-stakes capabilities of a real assistant, and Apple’s inability to ship them on home-grown models is the entire reason today happened. The delayed-features backstory matters because it explains the concession. Apple did not rent Gemini because Google offered a good price. It rented Gemini because two years of trying to do it in-house produced a Siri that still could not reliably tell you what was on your own screen.

Apple chose to be the product layer

The strategic read is the part worth sitting with. Apple has decided that being the best integration, distribution, and privacy layer over someone else’s model beats shipping a worse model of its own. It will own the device, the on-device tier, the routing logic, the privacy contract, and the billion-user surface Siri runs on. It will rent the frontier reasoning.

That is the exact opposite of the bet Microsoft made at Build the same week, where the entire point of the in-house MAI models was to stop depending on a partner’s frontier model. Amazon made the same call earlier this spring when it rebuilt Alexa around its own AI rather than a licensed one. Two of the most powerful companies in technology looked at the same dependency question and split: Microsoft decided owning the model was worth building seven of them; Apple decided owning the model was not worth shipping a bad assistant for another two years. Both can be right, because they are selling different things. Microsoft sells developer tools where inference cost is the margin. Apple sells hardware where the assistant is a feature that has to work, and the cheapest path to “works” ran through Mountain View.

The risk Apple is taking is strategic dependence. Its flagship assistant’s intelligence now sits on a contract with a direct competitor, and contracts get renegotiated. Google has every incentive to raise the price or change the terms once Siri’s quality is publicly tied to Gemini and Apple has no in-house fallback ready. Apple has bought itself a working Siri and a recurring vulnerability in the same deal.

The multi-model hedge

Apple did leave itself one exit. The new Extensions system lets users route requests to ChatGPT, Gemini, or Claude as alternatives, positioning Apple as a neutral front end rather than a Google reseller. It is a modest hedge — Gemini is the default and the one Siri itself runs on — but it establishes the principle that the model underneath is swappable. If the Google relationship sours, the architecture to plug in a different provider already exists, which quietly improves Apple’s hand in every future renegotiation.

That hedge, more than the demos, is the tell. Apple built a Siri that works by renting the best available model, and it built the seams to swap that model out the moment the math or the politics changes. Cook’s last keynote will be remembered for the concession, but the more Apple-like move was making sure the concession is reversible. Whether the company that once defined itself by controlling its whole stack can hold its position as a renter of the most important part is the question John Ternus inherits.

Sources

Want to discuss this topic?

We'd love to hear about your specific challenges and how we might help.