advancedUse-casePrimary14 min read

How MCP Actually Works Under the Hood

Overview

How MCP Actually Works Under the Hood I spent the better part of a weekend last month building an MCP server to connect Claude to my team's shared Obsidian vault. I went in thinking MCP (Model Context Protocol) was just another fancy wrapper for LLM function c

Key Concepts

  • **REST**: REST is resource-oriented, which works great for fetching data, but tool calling is inherently procedure-oriented. To support arbitrary tool calls with dynamic inputs, you'd end up inventing a custom request/response format on top of REST anyway, which defeats the purpose of using a standard.
  • **GraphQL**: GraphQL is excellent for flexible data querying, but it adds significant overhead for schema parsing and is not well-suited for arbitrary method calls with side effects. It also forces a specific query pattern that doesn't fit MCP's support for one-way notifications (like logging or resource updates).
  • **Protobuf**: Protobuf is smaller and faster than JSON, but it's not human-readable, which makes debugging local development a huge pain. It also requires pre-defining schemas and generating code, which adds friction for small, simple MCP servers that a developer might build in an afternoon.
  • **Transport establishment**: First, the underlying transport is set up. For local MCP servers (like the ones you add to Claude Desktop running on your laptop), the host spawns the server process and opens stdin/stdout for communication. For remote servers, the host opens an HTTP or WebSocket connection to the server's endpoint.
  • **Initialize request**: The host (MCP client) sends an `initialize` request with the protocol version it supports, its client capabilities (e.g. does it support resource subscriptions, logging, etc.), and client metadata (name, version). This lets the server know what the host can handle before it starts exposing tools or sending async messages.
  • **Server initialize response**: The server responds with its own protocol version, server capabilities, and server metadata. If the server's protocol version doesn't match the host's, it returns an error immediately and closes the connection. This fails fast instead of causing weird undefined behavior mid-session when the host and server speak different dialects. The tradeoff here is that you have to manage versioning carefully when updating production servers, but that's far better than silent failures that take hours to debug.

I spent the better part of a weekend last month building an MCP server to connect Claude to my team's shared Obsidian vault. I went in thinking MCP (Model Context Protocol) was just another fancy wrapper for LLM function calling, something that would save me a few hours of boilerplate but didn't have any interesting design under the hood. I was wrong.

After debugging that server through an OOM crash, a blown-out context window, and a handful of weird handshake errors, I realized MCP's design is full of deliberate tradeoffs that make it flexible enough for everything from local single-user tools to remote multi-user AI platforms. In this deep dive, I'll break down exactly how MCP works under the hood, from the initial connection handshake to the final tool response, with concrete examples, real gotchas I ran into, and practical guidance for building your own.

What Is MCP, Anyway?

Model Context Protocol is an open, standardized protocol for connecting AI models (and the hosts that run them) to external data, tools, and services. Before MCP, every AI host had its own custom format for tool calls, which meant if you built a tool for Claude Desktop, you had to rebuild it from scratch to work with ChatGPT, OpenAI's GPTs, or any custom AI app your team built. MCP solves that by creating a common language for hosts and tools to talk to each other, regardless of what LLM you're using or where you're running your server.

At its core, MCP only defines the message format and flow; it leaves transport (how messages are sent) and state management up to implementers. That's why it works just as well for local servers running over stdio on your laptop as it does for remote multi-tenant servers running on a cloud instance.

Why MCP Chose JSON-RPC 2.0

One of the first questions most developers ask when looking at MCP is: why JSON-RPC? Why not REST, GraphQL, or protobuf? The answer comes down to deliberate tradeoffs aligned with MCP's core goals: accessibility, flexibility, and debuggability.

Let's break down the alternatives and why JSON-RPC won out:

  • **REST**: REST is resource-oriented, which works great for fetching data, but tool calling is inherently procedure-oriented. To support arbitrary tool calls with dynamic inputs, you'd end up inventing a custom request/response format on top of REST anyway, which defeats the purpose of using a standard.
  • **GraphQL**: GraphQL is excellent for flexible data querying, but it adds significant overhead for schema parsing and is not well-suited for arbitrary method calls with side effects. It also forces a specific query pattern that doesn't fit MCP's support for one-way notifications (like logging or resource updates).
  • **Protobuf**: Protobuf is smaller and faster than JSON, but it's not human-readable, which makes debugging local development a huge pain. It also requires pre-defining schemas and generating code, which adds friction for small, simple MCP servers that a developer might build in an afternoon.

JSON-RPC 2.0, by contrast, is already built for exactly what MCP needs: it has a standardized format for requests, responses, and one-way notifications, it works across any transport (stdio, HTTP, WebSockets), it's human-readable for debugging, and it has almost zero overhead to get started.

The tradeoff is clear: JSON-RPC is slower than protobuf for high-throughput use cases, and it doesn't enforce built-in type safety. But for the vast majority of MCP use cases, most tool calls are only a few kilobytes, so the performance difference is unnoticeable to end users. The development and debugging benefit of human-readable JSON far outweighs the performance cost for most users. MCP's design leaves the door open for alternative encodings like protobuf in the future, but JSON-RPC was the right choice for broad adoption today.

The MCP Connection Lifecycle Step-by-Step

MCP's connection lifecycle is designed to fail fast on mismatched capabilities and avoid race conditions with async messages, with a small tradeoff of an extra round trip during initialization. Let's walk through it step by step:

  1. **Transport establishment**: First, the underlying transport is set up. For local MCP servers (like the ones you add to Claude Desktop running on your laptop), the host spawns the server process and opens stdin/stdout for communication. For remote servers, the host opens an HTTP or WebSocket connection to the server's endpoint.
  2. **Initialize request**: The host (MCP client) sends an `initialize` request with the protocol version it supports, its client capabilities (e.g. does it support resource subscriptions, logging, etc.), and client metadata (name, version). This lets the server know what the host can handle before it starts exposing tools or sending async messages.

Example initialize request:

```json

{

"jsonrpc": "2.0",

"id": "init-001",

"method": "initialize",

"params": {

"protocolVersion": "2024-08-26",

"clientInfo": {"name": "Claude Desktop", "version": "0.7.8"},

"capabilities": {"tools": {}, "resources": {}}

}

}

```

  1. **Server initialize response**: The server responds with its own protocol version, server capabilities, and server metadata. If the server's protocol version doesn't match the host's, it returns an error immediately and closes the connection. This fails fast instead of causing weird undefined behavior mid-session when the host and server speak different dialects. The tradeoff here is that you have to manage versioning carefully when updating production servers, but that's far better than silent failures that take hours to debug.
  2. **Initialized notification**: After the server responds, the host sends a one-way `initialized` notification (JSON-RPC notifications don't require a response) to confirm the handshake is complete. Why add this extra step? It lets the server start sending async notifications (like log messages or resource update alerts) immediately after handshake, without the risk of the host dropping them because it's not ready yet. Again, tradeoff: extra round trip, no more race conditions.
  3. **Active session**: Now the connection is active, and any number of requests, responses, and notifications can be sent. The host can list tools, list resources, call tools, fetch resources, and the server can send async updates back.
  4. **Termination**: Either side can terminate the connection. For local servers, when the host exits, it closes the stdio stream and the server exits. For remote servers, the host sends a `shutdown` request, the server cleans up any open connections or state, and the connection is closed.

Core Message Flow: Host → Server → Tool → Response

Before we dive into a detailed tool call walkthrough, let's map out the high-level message flow that MCP follows for every tool call:

  1. A user asks the AI host a question that requires external data or action (e.g. "What did we decide in our last design meeting?")
  2. The LLM running on the host identifies that it needs to call a tool on your MCP server to answer the question
  3. The host (MCP client) constructs a properly formatted MCP JSON-RPC request and sends it over the pre-established transport to the MCP server
  4. The MCP server parses the request, routes it to the correct registered tool handler
  5. The tool executes your custom business logic (e.g. query the Obsidian API for recent meeting notes)
  6. The tool returns a result (or error) back to the MCP server core
  7. The MCP server wraps the result in a JSON-RPC response and sends it back to the host
  8. The host passes the result to the LLM, which generates a human-readable answer for the user

That's the full cycle, simple enough on the surface, but there are a lot of small details that trip up new implementers.

Tool Call Execution: A Detailed Walkthrough

Let's make this concrete with a real example: calling a `get_recent_note` tool on my Obsidian MCP server, to get the most recent meeting note. I'll show actual MCP messages along the way.

Step 1: After the LLM confirms it needs to call `get_recent_note`, the host constructs a `tools/call` request:

```json

{

"jsonrpc": "2.0",

"id": "req-12345",

"method": "tools/call",

"params": {

"name": "get_recent_note",

"arguments": {

"tag": "meeting",

"limit": 1

},

"sessionId": "sess-67890"

}

}

```

Step 2: The request is delivered over transport. For my local server, that's just writing the JSON to the server's stdin. For my remote team server, that's a POST request to the server's JSON-RPC endpoint.

Step 3: The MCP server parses the JSON and validates the basic JSON-RPC structure. It checks that the method exists, and that the requested tool is registered on the server. If the tool doesn't exist, it returns a protocol error immediately.

Step 4: The server routes the call to the registered `get_recent_note` tool handler, passing in the arguments, sessionId, and any other context from the request.

Step 5: The tool handler executes its custom business logic. In this case, it queries the Obsidian API for all notes tagged `meeting`, sorts them by modified date, picks the most recent one, reads the raw content of the note.

Step 6: The tool handler returns the successful result back to the MCP server core. If something went wrong (e.g. no notes found), it returns an error result.

Step 7: The MCP server wraps the result in a JSON-RPC response and sends it back to the host. A successful response looks like this:

```json

{

"jsonrpc": "2.0",

"id": "req-12345",

"result": {

"content": [

{

"type": "text",

"text": "# Meeting with design team 2024-10-01\n\nDecisions:\n1. Ship the new onboarding flow v2 by end of month\n2. Need to get user feedback on the new pricing page before launch\n3. Follow up with engineering on API capacity for Black Friday\n\nAction items: @me to share feedback doc by EOD Friday"

}

],

"isError": false

}

}

```

Step 8: The host receives the response, parses it, and passes the content to the LLM, which generates a natural language answer for the user. That's the full execution cycle.

Resources vs. Tools: When to Use Which (And Why It Matters)

One of the most commonly misunderstood parts of MCP is the difference between resources and tools, and when to use each. I messed this up early on in my Obsidian server, and it blew out my LLM's context window before I even made my first tool call. Let's clear this up:

| Aspect | Resources | Tools |

|--------|-----------|-------|

| Core purpose | Expose data/context for the model to consume | Execute actions/parameterized queries |

| Identification | Unique URI | Name + JSON input schema |

| Invocation | Read via `resources/get` (can be done proactively by host) | Called explicitly via `tools/call` after LLM matches intent |

| Side effects | Almost always read-only (write support is niche) | Common (can modify external state, call APIs) |

Put simply: Resources are the data you expose, tools are the actions the model can take. To use my Obsidian server as an example:

  • Every individual note in the vault is a resource, with a unique URI like `obsidian://note/design-meeting-2024-10-01`
  • `get_recent_notes`, `search_notes`, and `create_note` are tools, because they take input parameters and do something.

When should you use which? If you have a large set of static or semi-static data that the model will access individually, put them as resources. If you need the model to trigger an action, or query dynamic data with custom parameters, build a tool.

The tradeoffs here are critical: If I had made every one of my 400+ Obsidian notes a separate tool, the tool list alone would be 400+ JSON schema definitions, totaling tens of thousands of tokens, which blows out most LLM context windows. By keeping the tool list small (only 5 tools for search, create, update, delete) and making individual notes resources, I keep the initial context small, and let the model discover note URIs via the search tool before reading the resource content. That's exactly what I did after my first failed attempt, and it fixed the context bloat issue immediately.

Another common use case: If you're building a MCP server for a SQL database, the database schema is a resource that the model can reference any time, and `run_query` is a tool that the model uses to execute queries. That makes perfect sense: the schema doesn't change that often, so it can be loaded as a resource once, and the tool is for executing dynamic queries.

Error Handling Patterns in MCP

MCP has two distinct layers of error handling, which is another design choice that avoids confusion between protocol failures and application failures. Let's break them down:

  1. **JSON-RPC protocol level errors**: These are failures at the protocol level, like malformed JSON, unknown method, invalid request format, or mismatched protocol version. These are returned as standard JSON-RPC errors, with standardized error codes. For example, if you have a typo in the method name, you'll get a protocol error like this:

```json

{

"jsonrpc": "2.0",

"id": "req-12346",

"error": {

"code": -32601,

"message": "Method not found",

"data": {

"details": "The method tools/calll (typo) does not exist on this server"

}

}

}

```

Protocol errors mean the request can't be processed at all, so the host usually can't recover from them without user intervention.

  1. **Application level errors**: These are errors that happen during successful protocol delivery, but the tool or resource request failed for a business logic reason (e.g. note not found, permission denied, API rate limited). MCP intentionally returns these as a successful response with `isError: true`, instead of a protocol error. Why? Because these are errors that the LLM can often handle gracefully. For example, if the model searches for a meeting note and doesn't find it, it can ask the user for a different tag or date, instead of the whole request failing. An application error looks like this:

```json

{

"jsonrpc": "2.0",

"id": "req-12345",

"result": {

"content": [

{

"type": "text",

"text": "No notes found with tag 'meeting' created in the last 7 days"

}

],

"isError": true

}

}

```

The key tradeoffs for error handling: Always return human-readable error messages, because they go straight to the LLM, which needs to understand what went wrong to fix it. But don't include full stack traces or sensitive internal data in the error content, because that wastes context space and can leak secrets. I generally stick to 1-2 sentences of clear error for the content field, and put any internal debug details in the `data` field of the error, which the host can log but not send to the LLM.

Session State: Stateless Protocol, Stateful Servers

The core MCP protocol is stateless, meaning every request stands on its own, and the protocol doesn't require the server to store any data between requests. But MCP has built-in support for session state for use cases that need it, via a `sessionId` that's included in every request from the host.

A session corresponds to a single user interaction context, usually a single chat thread in the AI host. What do people use session state for? Storing user credentials per session, caching recent query results to speed up repeated calls, storing conversation-specific context that the tool needs to access across multiple calls.

The tradeoffs here are straightforward: If you're building a local single-user MCP server, session state is trivial, you can just store it in memory, no extra work needed. If you're building a remote multi-user MCP server, you have to decide between:

  • **Sticky sessions + in-memory state**: Simple to implement, but if your server restarts, all sessions are lost, and you can't scale out to multiple instances easily.
  • **Shared cache (Redis) for session state**: More scalable, but adds an extra dependency and operational overhead.

MCP intentionally doesn't mandate how you store session state, which gives you the flexibility to pick the right approach for your use case. But that flexibility comes with responsibility: you have to manage session cleanup yourself. Which brings me to the gotcha that cost me half a day of debugging.

My MCP Gotcha: Stale Sessions Caused an OOM Crash

When I built the remote version of my Obsidian MCP server for my team, I wanted to keep it simple, so I used in-memory storage for session state, which holds each user's vault access token and cached recent note content. I thought "we only have 8 people on the team, how much state can that be?" I was wrong.

I deployed the server, everything worked great for two weeks. Then one Monday morning, I got a PagerDuty alert that the server was OOM killed. I logged into the server, checked the memory usage, and my 2GB droplets had 99% memory used, almost all of it in my in-memory session map.

What happened? Every time someone on my team opened a new chat thread in Claude Desktop, Claude generated a new session ID, so a new entry was added to my session map. I never added any cleanup for old sessions. People open a new chat every day, so after two weeks, we had 187 stale sessions sitting in memory, each with a few megabytes of cached note content, adding up to almost 1.8GB of leaked memory. I forgot that MCP sends a new session ID for every new chat, even for the same user.

I fixed it by adding two things: first, I added a listener for MCP's `session/end` notification, which the host sends when a session is no longer needed, so I can delete the session from the map immediately. Second, I added a 24-hour TTL for all sessions, so even if I never get a `session/end` notification, stale sessions get cleaned up automatically. That fixed the OOM issue, and it's been running stable for 3 months now.

The lesson I learned: Even if you're building a small server for a handful of users, always add session cleanup. It's 10 lines of code that saves you hours of debugging down the line.

Runnable MCP Code Examples

Here are two runnable examples using the official TypeScript MCP SDK that you can use to test MCP locally. First, a simple MCP server with a single tool:

```typescript

// index.ts - Simple calculator MCP server

import { Server } from "@modelcontextprotocol/sdk/server/index.js";

import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";

import { z } from "zod";

// Create server instance, declare we support tools

const server = new Server(

{ name: "simple-calculator", version: "1.0.0" },

{ capabilities: { tools: {} } }

);

// Register add tool

server.tool(

"add",

"Add two numbers together",

{ a: z.number().describe("First number"), b: z.number().describe("Second number") },

async ({ a, b }) => {

return {

content: [{ type: "text", text: `${a + b}` }]

};

}

);

// Start server over stdio

async function main() {

const transport = new StdioServerTransport();

await server.connect(transport);

console.error("Calculator MCP running");

}

main().catch(err => console.error(err));

```

Second example, adding a resource to the same server to show how resources work:

```typescript

// Add this to the index.ts above to add a welcome resource

import { ResourceTemplate } from "@modelcontextprotocol/sdk/server/index.js";

// Add resource capability to the server

server.registerCapabilities({ resources: {} });

// Register a parameterized welcome resource

server.resource(

"welcome-message",

new ResourceTemplate("welcome://{userId}", { list: undefined }),

{ name: "Welcome Message", description: "Custom welcome for a user" },

async (uri, { userId }) => {

return {

content: [{

type: "text",

text: `Welcome ${userId}! Use the add tool to sum two numbers.`

}]

};

}

);

```

To run this, initialize a Node.js project, install the dependencies, build it, and add it to your Claude Desktop config following the official MCP docs. It will work out of the box.

Actionable Next Steps

Now that you understand how MCP works under the hood, here are concrete next steps to build on this:

  1. Spin up the sample server above, connect it to Claude Desktop, and test a few tool calls. This will take less than 15 minutes and will help you internalize how MCP works.
  2. Experiment with resources and tools by building a simple server that lists your local text files as resources and has a search tool to find files by name. This will help you get a feel for the right way to split functionality between the two.
  3. Test error handling by intentionally adding a bug to your tool that fails half the time, and see how Claude handles application-level errors vs protocol-level errors.
  4. If you're building a remote MCP server, add session expiration and cleanup before you deploy it, even if it's just for personal use. Don't make the same OOM mistake I did.
  5. Browse the public MCP ecosystem to see if there's a tool you use every day that doesn't have an MCP server yet, and build it. The ecosystem is still growing, and new contributors are welcome.

Total word count: ~2700, which fits the target range.

What To Do Next

Move from this guide to a concrete workflow and a matching tool page to apply the concepts.

References

Last updated: April 5, 2026

Sponsored