Anthropic's 'Dreaming' Lets Claude Agents Learn From Past Sessions Without Touching Model Weights

Anthropic’s Code with Claude 2026 keynote in San Francisco on May 6 was deliberately not a model launch. Mike Krieger said it plainly from the stage: “No new model today. Today is about how we are making our products work better for you.” What did ship is, on inspection, the more interesting kind of release — a set of features that change what an agent is over time, not what it can do in one shot.

The three Managed Agents features Anthropic unveiled — Dreaming, Outcomes, and Multiagent Orchestration — plus a new Routines capability for Claude Code and a doubled rate limit, share a single thesis: in 2026, the unit of work is not the prompt and not even the session. It is the agent, persisting across sessions, accumulating memory, and getting better at its job over time. The biggest infrastructure announcement, an exclusive lease on SpaceX’s Colossus supercluster, sits underneath all of it.

What Dreaming actually is

Dreaming is the easiest of the three to get wrong on first read. The branding is almost provocatively soft — “agents that dream” — and the obvious assumption is that Anthropic has shipped some form of online learning, where the model updates itself between conversations. It hasn’t, and Anthropic is unusually clear about this. “We’re not changing the model itself through dreaming,” senior product lead Brad Albert told reporters. “It’s not doing updates to the weights or anything like that.”

What Dreaming does, mechanically, is this: between sessions, Anthropic’s managed-agents service runs a scheduled job that reads back over your agent’s past sessions and its accumulated memory store, looks for patterns — repeated mistakes, repeated successful strategies, durable user preferences, recurring task structures — and curates the agent’s memory accordingly. The model weights are untouched. The agent’s context — what it knows about its work, its user, and what’s worked before — is what gets reorganized.

The closest mental model is what a person does on a quiet day at the end of a project: re-read the notes, throw out what was wrong, promote what was useful, write a short summary of what you learned. The agent does that on schedule. The Dreaming output is structured memory: condensed recall of past sessions, extracted patterns about user preferences, and notes on failure modes to avoid.

You decide how much control you want. Dreaming can run automatically — patterns extracted and memory updated without a human in the loop — or in review mode, where every proposed memory change waits for human approval before landing. For regulated workflows, the review mode matters; for high-velocity teams, the auto mode is the point.

Why this is the right level of abstraction

The reason Dreaming is significant isn’t the feature; it’s the abstraction layer Anthropic chose to work at.

Three other abstractions were available. The first is fine-tuning — every customer’s agent gets a slightly different model after each session. That’s where most “self-improving” claims actually go. It is also catastrophically expensive at scale, brittle when sessions are mixed-quality, and a compliance nightmare for any customer who needs to know exactly what model their agent is running on. Anthropic deliberately didn’t go there.

The second is session-level memory — what most agent products already have, where each session has a working memory but it doesn’t survive. That’s not new, and it doesn’t improve the agent over time.

The third, and the one Dreaming actually picks, is persistent memory curated by a separate process. The model stays fixed. The behavior changes because the prompt context the agent is operating in is genuinely better than it was last week. This is essentially what experienced people do — the model is the brain you came in with; the memory is the years on the job — and it scales gracefully because the curation step is a job, not a training run.

Harvey, the legal AI company, reported that task completion rates rose roughly 6x after implementing Dreaming. Document-review company Wisedocs cut review times by 50% using Outcomes (the second of the three features — more on that below). These numbers should be read with the usual customer-quote caveats: they are vendor-reported, on the customer’s chosen workloads, and on metrics the customer picked. But they are consistent with the qualitative claim that an agent that remembers what worked last week is meaningfully better than one starting fresh.

Outcomes and Multiagent Orchestration

The other two Managed Agents features round out the package.

Outcomes is a goal-specification mechanism. Instead of writing the prompt that defines what the agent should do, you write the result you want — “every invoice over $50K gets manually reviewed before payment," "support tickets get categorized and routed to the right team within 5 minutes," "the monthly close ledger balances to within$ 100.” The agent iterates against the outcome over multiple attempts, learning what worked, and Outcomes interacts with Dreaming so that the patterns it discovers persist into future sessions. The Wisedocs 50% time reduction is on Outcomes specifically.

The pattern matters because it inverts the prompting model. You no longer have to write the procedure. You write the result and let the agent figure out the procedure, then refine over time. For workflows where the procedure is fuzzy but the success criterion is clean — and a lot of business processes look exactly like that — this is the right interface.

Multiagent Orchestration lets you scale a fleet of agents on a single task. Instead of one agent attempting a long task linearly, you decompose into subtasks and let multiple agents run in parallel, each with its own scope and context, coordinated by an orchestrator. This is the productized version of patterns like LangGraph and CrewAI, brought inside the Claude Managed Agents surface with native memory, monitoring, and the same governance plane the rest of the platform uses.

The three features compose. Multiagent Orchestration decomposes the work. Outcomes defines what each agent is trying to achieve. Dreaming makes the fleet smarter session over session. Read together, this is Anthropic’s most coherent attempt to ship “the agent platform” as a product, rather than as a collection of capabilities you stitch together.

Routines, rate limits, and the developer story

Code with Claude is, ultimately, a developer conference, and the developer-facing announcements were the ones that played to the room.

Routines for Claude Code is the most consequential. Routines are scheduled, async automations — you describe a recurring task (“review every PR for compliance with our style guide and post a comment with proposed fixes”), Claude Code runs it on schedule against your repo, and you wake up to ready-to-merge PRs in your queue. The framing was “wake up to PRs that are ready to merge,” which is the kind of marketing line that lands at a developer keynote because the underlying experience is genuinely different from anything currently shipping. Async coding agents on a cron schedule, with the same trust model as your CI, is the productized version of what most agent power users were already gluing together with custom infrastructure.

Doubled rate limits for Claude Code’s five-hour window on Pro, Max, and Enterprise plans is the unromantic, important announcement. Rate limits were the binding constraint for teams pushing Claude Code hard. Doubling them removes that ceiling for the customers most likely to be running multi-agent workflows.

These pair naturally with Codex on Mobile, which OpenAI shipped a week later. Both labs are now visibly converging on the same coding-agent shape: async, scheduled, multi-agent, with phone-accessible supervision. The marketing language differs. The product is the same.

The SpaceX Colossus deal

The infrastructure announcement Anthropic was happiest to make was an exclusive lease of all the capacity in SpaceX’s Colossus supercluster. The cluster is among the largest privately-held AI compute installations in the world; locking up its capacity is the kind of move that materially changes Anthropic’s training and serving roadmap.

Coming on top of the $30B revenue run rate and the previously announced Google and Broadcom chip deals, the Colossus lease completes a story that has been forming quietly all year: Anthropic is buying compute at a scale that would have been Google-and-Microsoft-only two years ago. Whether that translates into a sustained lead depends on what gets trained on it; the lease is, on its own, neither a model launch nor a product. It is a prepayment on the next two model generations.

For context on the dollar scale, Meta’s $145B AI capex commitment](/resources/insights/meta-145b-capex-ai-infrastructure-buildout/) and [OpenAI's$ 122B funding round make it clear that frontier compute supply is being locked up by maybe four to five buyers. The Colossus lease is Anthropic claiming its share.

What enterprise buyers should take from this

A few practical things for teams already on Claude or evaluating it.

Dreaming pays off on workloads that compound. If the use case is one-shot — generate this blog post, summarize this document — Dreaming adds nothing useful, because there is no “last session” worth learning from. If the use case is a repeating workflow with idiosyncratic context (your specific customers, your specific approval policies, your specific data shape), Dreaming is where the leverage sits. Pilot it on a process you run weekly and measure whether the agent actually gets better.

Outcomes inverts the prompting model, which is also a discipline. Defining the outcome cleanly is harder than writing a procedure. You have to know what success looks like before you can ask an agent to achieve it. Teams with clear acceptance criteria and well-defined SLAs will get value fast. Teams without them are going to discover that the work of defining the outcome is the work, and the agent just amplifies whatever clarity you bring.

Multiagent Orchestration is overkill for most workflows. The temptation to decompose every task into a fleet is real. Most tasks don’t need it, and the operational complexity of running and debugging a multi-agent system is non-trivial. Start single-agent. Move to multiagent only when you can articulate why one agent isn’t enough.

The broader question hanging over Claude Managed Agents — and over the agent market generally — is whether this is a platform that improves with use, or a wrapper around a powerful model that doesn’t. Anthropic’s bet, with Dreaming, is that the answer matters and that customers will eventually pick on it. The next two quarters of customer outcomes will tell us whether they’re right.

What to watch

The first thing is how OpenAI and Google answer Dreaming. If persistent agent memory with curated learning is the right abstraction, both labs ship something equivalent within months. If they don’t, that’s a signal Anthropic’s bet is contrarian. Look for “agent memory” announcements from both labs by Q3.

The second is whether Outcomes scales into SLA-bound enterprise workflows. Defining an outcome is easy for “categorize support tickets.” It gets much harder for “execute the monthly financial close per GAAP.” If Outcomes can be wired into workflows with real compliance teeth, the product becomes something different. If it stays in the casual-task zone, it’s a productivity feature.

The third is Routines adoption among Claude Code’s heaviest users. Async scheduled coding agents are the kind of capability whose value is non-obvious until you’ve lived with it. The customers who lean in will be the ones whose workflows reshape around it; the customers who don’t will keep using Claude Code as an interactive assistant. Case studies from heavy users by mid-summer are the leading indicator.

Code with Claude 2026 will probably be remembered less for any single feature than for being the moment Anthropic decided to ship the agent platform — the persistence, the orchestration, the supervisors, the memory curators — as the product. The model is now a component of that platform, not the product itself.