AI & MCP

MCP in 2026: the protocol grew up

Eighteen months of spec revisions turned a clever stdio hack into stateless, OAuth-aligned infrastructure you can actually put behind a load balancer.

By Tishan David 6 min read

The first MCP server I wrote, in early 2025, talked to its client over stdin and stdout. It worked beautifully on my laptop and was completely useless to anyone else, because it lived and died with a child process I’d spawned. That was the whole shape of the protocol at launch: a JSON-RPC pipe between two processes on the same machine. Eighteen months and four spec revisions later, the Model Context Protocol you deploy in 2026 is closer to a normal HTTP service with OAuth than to that local pipe. The interesting part is how deliberately it got there.

From a local pipe to a remote service

Anthropic shipped MCP in November 2024 with a single useful transport — stdio — and a second, awkward one: HTTP plus a long-lived Server-Sent Events stream. The HTTP+SSE design used two endpoints (a POST for requests, a separate GET that held an SSE connection open for server-to-client messages) and it pinned every client to one server process for the life of the session. You could not put it behind a round-robin load balancer without sticky sessions and a shared session store. For anyone trying to run MCP as a real service, that was the wall.

The 2025-03-26 revision tore that out and introduced Streamable HTTP. One endpoint. The server replies to a POST with either a plain JSON response or, when it needs to stream, an SSE body on that same request. Crucially, an Authorization: Bearer header can ride on every envelope instead of being negotiated once at the top of a stream. That single change is what made remote MCP servers, and the whole hosted-server market that followed, viable.

Here’s the version timeline worth memorising, because the revision string is what you actually negotiate on the wire:

  • 2024-11-05 — launch. stdio + HTTP/SSE.
  • 2025-03-26 — Streamable HTTP, first real authorization framework (OAuth 2.1 subset).
  • 2025-06-18 — authorization hardened into proper OAuth resource-server semantics; structured tool output; elicitation.
  • 2025-11-25 — current stable. OIDC discovery, icons, tool-calling in sampling, experimental tasks, JSON Schema 2020-12 as default dialect.
  • 2026-07-28 — the release candidate locked on 21 May, publishing late July. Stateless core.

Authorization stopped being hand-wavy

The launch spec barely addressed auth. The 2025-06-18 revision is where it became something a security team would sign off on. Three RFCs do the heavy lifting.

MCP servers are now classified as OAuth 2.0 resource servers only — they validate tokens, they don’t mint them. The old fallback endpoints (/authorize, /token, /register baked into the MCP server) are gone, replaced by mandatory RFC 9728 Protected Resource Metadata. When a client hits a protected server unauthenticated, it gets a 401 that tells it exactly where to go:

GET /mcp HTTP/1.1
Host: tools.example.com

HTTP/1.1 401 Unauthorized
WWW-Authenticate: Bearer resource_metadata="https://tools.example.com/.well-known/oauth-protected-resource"

The client fetches that metadata document, reads the authorization_servers field, and goes to obtain a token from the real authorization server. When it requests that token it must include a RFC 8707 resource indicator — the resource parameter — which binds the token to this specific MCP server. That binding closes the most obvious hole: the spec now explicitly forbids servers from passing client tokens through to upstream APIs, so a compromised or malicious server can’t replay your token somewhere else. If you’re building servers that broker access to third-party APIs, this token-passthrough ban is the rule people most often get wrong, and it shows up the moment you trace how access propagates through a production deployment, which I’ve written up in more detail in a set of build notes on shipping MCP in anger. The 2025-11-25 revision then layered OpenID Connect discovery and incremental scope consent on top.

The 2026 stateless core

The release candidate dated 2026-07-28 is the biggest swing since launch, and it’s aimed squarely at operations. It removes the initialize/initialized handshake and the Mcp-Session-Id header. Client capability info now travels in _meta on every request, so any server instance can serve any request — genuinely behind a plain load balancer, no shared session state.

Server-to-client requests get reworked to match. Instead of holding an SSE stream open waiting to ask the user something, a server returns an InputRequiredResult; the client gathers the answer and reissues the original call with the request state attached. Because everything needed is in the payload, the retry can land on a different instance. There’s more operational plumbing too: Mcp-Method and Mcp-Name headers let proxies route without parsing the body, results carry ttlMs and cacheScope for caching, and W3C Trace Context rides in _meta for distributed tracing.

It also brings the first formal deprecation policy — a 12-month window between deprecation and removal — and immediately uses it. Roots, Sampling, and Logging are now deprecated. If you’re starting fresh, don’t build on those three. Two capabilities move into a proper extensions framework with reverse-DNS identifiers: MCP Apps (server-rendered HTML interfaces) and a redesigned Tasks extension for durable, pollable operations.

What’s actually stable, and what to avoid

The SDK story is the clearest signal of maturity. There are now official SDKs for TypeScript, Python, Go (with Google), C# (with Microsoft), Java (with Spring AI), plus Rust, Ruby, Kotlin, and more. They’re versioned independently of the spec and the maintainers get a ten-week window between RC lock and final publication to catch up. Stick to the official SDK for your language; the transport, auth, and now-stateless mechanics are exactly the parts you do not want to reimplement by hand.

The reference servers repo tells the other half of the story. It deliberately shrank. modelcontextprotocol/servers now holds seven reference implementations — Everything, Fetch, Filesystem, Git, Memory, Sequential Thinking, and Time — and the rest (Brave Search, the old GitHub and Google Drive integrations, the Postgres/SQLite/Redis servers) were moved out to a servers-archived repository. Read that as intent: those reference servers are educational examples, not production dependencies. The maintainers are pushing real servers toward the published registry and toward vendors who own them. If you copied the archived GitHub or Postgres server into a product, you’re on an unmaintained fork and should know it.

So, concrete guidance for mid-2026:

  • Rely on: Streamable HTTP, RFC 9728 + RFC 8707 authorization, structured tool output, the official SDKs, and 2025-11-25 as your stable target.
  • Build toward: the stateless 2026-07-28 model — design servers as if no session state exists, because soon it won’t.
  • Avoid: the legacy HTTP+SSE transport, the removed in-server OAuth endpoints, token passthrough, Roots/Sampling/Logging for anything new, and treating the archived reference servers as supported.

Why it matters

MCP spent 2024 as a clever way to wire a model to a tool on your own machine. The reason it’s now negotiated by vendors, fronted by load balancers, and audited by security teams is that each revision removed a specific reason you couldn’t run it as real infrastructure: the dual-endpoint stream, the hand-rolled auth, the sticky session. The decisions that hold up best are the boring, structural ones — OAuth resource-server semantics, RFC-backed token binding, statelessness — rather than the feature checklist. If you’re deciding how to structure your own servers around those constraints, that’s the layer worth getting right first, and it’s the through-line of the architecture material in the handbook. The protocol grew up the way good protocols do: by deleting the parts that didn’t scale.

What I’m testing next

I’m porting a stateful 2025-11-25 server to the 2026-07-28 RC to measure what the handshake removal actually buys in a multi-instance deployment, and whether the InputRequiredResult retry pattern holds up under a real elicitation flow rather than a demo. I’ll report the numbers once the final spec lands in late July.