MCP vs raw tool-calling: when the protocol earns its keep

The question that keeps getting answered badly

Every second thread about agent architecture collapses into a false binary: MCP or function calling, pick a side. That framing is wrong on its face. Provider-native tool calling is how a model requests an action. MCP is how that action gets served — discovered, authenticated, executed, versioned — across many clients. One is a turn in a conversation. The other is an integration contract. You can run either alone, and most production agents end up running both.

The interesting engineering question isn’t which wins. It’s: when does the protocol layer pay for the network hop, the extra process, and the spec churn it drags in? Sometimes it clearly does. Sometimes you’re standing up a JSON-RPC server to expose three functions that live in the same repo as your agent, and you’ve bought yourself latency and a deployment target for nothing.

The same shape, twice

Strip both down and the model-facing contract is nearly identical: a name, a description, and a JSON Schema for the inputs. Here’s Anthropic’s tool-use shape — a tool definition you send inline with the request:

{
  "name": "get_invoice",
  "description": "Fetch an invoice by ID. Use when the user references a specific invoice number.",
  "input_schema": {
    "type": "object",
    "properties": { "invoice_id": { "type": "string" } },
    "required": ["invoice_id"]
  }
}

The model replies with a tool_use block (stop_reason: "tool_use"), your code runs the function, and you send back a tool_result keyed to the tool_use_id. OpenAI’s shape is the same idea with different field names — tools carrying type: "function", a tool_calls array in the response, a role: "tool" message going back. The schemas are data you pass per request, which means a tool can be assembled at runtime and torn down the next turn.

Now the same capability as an MCP tool, registered on a server (TypeScript SDK):

server.registerTool(
  "get_invoice",
  {
    description: "Fetch an invoice by ID.",
    inputSchema: { invoice_id: z.string() },
  },
  async ({ invoice_id }) => {
    const inv = await db.invoices.find(invoice_id);
    return { content: [{ type: "text", text: JSON.stringify(inv) }] };
  },
);

Notice what moved. The schema Claude eventually sees is generated from this registration and handed over by the client at connect time via tools/list — not hardcoded into the agent. The agent never imported your invoice code. It opened a transport (stdio or Streamable HTTP), asked what tools exist, and got back a list. That indirection is the entire value proposition, and the entire cost.

The N×M problem, with actual arithmetic

The case for MCP is combinatorial. Say you have M surfaces that need tools — a chat client, an IDE plugin, a CI bot, a Slack agent — and N capabilities — a database query tool, a GitHub bridge, a search index, a deploy trigger. Wire those directly with function calling and each surface re-implements each capability’s invocation, auth, and error handling. That’s M × N integrations to build and keep in sync. Add a fifth surface and you owe four new integrations on day one.

MCP rewrites that as M + N. Each capability becomes one server. Each surface becomes one client that speaks the protocol. The deploy-trigger tool gets written, audited, and rate-limited once; all four — soon five — surfaces inherit it. Change the tool’s behavior and the change propagates through tools/list rather than through four pull requests against four codebases. By early 2026 the public ecosystem listed over 1,000 community servers precisely because someone else already paid the N cost for databases, browsers, and filesystems.

This decoupling is also the portability argument. Provider-native tool definitions are provider-shaped: switching from one vendor’s tool_calls to another’s tool_use means rewriting every definition and every result-handling branch. An MCP server doesn’t care which model is upstream. The lock-in moves from your integration code to a wire format that multiple vendors now consume. If you’ve migrated an agent across providers — I’ve written up the scars from one such migration in a case study — that portability is not abstract.

Where MCP is overkill

Now the honest half. The N×M math only bites when M > 1 or N is large and shared. For a single application with three tools that live in the same deployment, MCP buys you:

A network hop. Function calling resolves inside one request/response cycle against the model API. MCP adds a round trip to a separate server, even when that server is localhost. On latency-sensitive paths — voice agents, autocomplete — that hop is real.
A second thing to run. Your three functions were imports. Now they’re a process with a transport, a lifecycle, health checks, and a place in your deployment topology.
Spec exposure. MCP is moving fast. The stable spec was 2025-11-25; the 2026 roadmap points at a stateless core, a Tasks extension for long-running work, and tightened OAuth-aligned authorization. That’s healthy evolution, but it’s surface area you now track for a server that talks to one client you also own.

If those three functions never leave the repo, function calling is the correct answer. Define the schema inline, handle the tool_use block, move on. Reaching for a protocol to decouple a thing from itself is architecture as cargo cult.

The decision, compressed

Signal	Reach for function calling	Reach for MCP
Surfaces using the tool	One	Two or more
Tool lifetime	Built per-request, dynamic	Stable, shared
Latency budget	Tight (single hop matters)	Tolerant of a round trip
Governance need	Inline in app	Gateway: auth, rate limit, audit
Ownership	You own both ends	Tool owned by another team
Provider portability	Single vendor, fine	Multi-vendor or hedging

The cleanest production shape I keep landing on uses both deliberately: function calling for the application-specific tools that will only ever serve this one agent, and MCP for the shared infrastructure — the database bridge three teams hit, the deploy trigger that needs an audit trail. The protocol earns its keep at the boundary where a capability outlives the agent that first called it.

Why it matters

The reflexive “use MCP for everything” advice produces the same failure mode as every premature-abstraction cycle before it: teams standing up servers, transports, and OAuth flows to expose functions that a four-line inline schema would have handled. The reverse — refusing the protocol when you genuinely have M surfaces and N shared tools — quietly recreates the M×N integration sprawl MCP was built to delete, one pull request at a time.

Treat the choice as arithmetic, not allegiance. Count your surfaces, count your shared tools, price the hop. If M+N beats M×N for your topology, the server is worth it. If it doesn’t, the inline schema you already know how to write is not a compromise — it’s the right tool.