Blog/

Advanced

8 min read

MCP Performance: Real Benchmark Numbers You Can Actually Use

Benchmark numbers comparing stdio, HTTP/SSE, and WebSocket transports. Includes cache performance data and practical recommendations.

LL

Lee Li

Independent Developer · MCP Enthusiast

·

MCP Performance: Real Benchmark Numbers You Can Actually Verify

I spent three weeks benchmarking MCP performance across transport protocols, serialization overhead, tool call latency, and caching effectiveness. The results contradicted several popular assumptions in the MCP community.

Transparency note: These benchmarks were conducted on my own hardware and reflect my specific test conditions. Your results will vary based on hardware, network conditions, and workload patterns. I will explain exactly what I tested and how you can reproduce these measurements yourself.

Disclaimer: All performance numbers below are specific to my test environment. Do not treat these as universal benchmarks. See the "Limitations" section for details on what these numbers apply to and what they do not.

Test Setup: What I Built and How

I created a benchmark harness that simulates production-like workloads. The test MCP server implements 5 tools:

  • Two fast tools (pure computation, no I/O)
  • Two medium tools (HTTP API calls with 50ms simulated latency)
  • One slow tool (database query with 200ms simulated latency)
  • I measured across 1000 consecutive tool calls, discarding the first 50 as warmup to account for JIT compilation and connection establishment.

    Test environment:

  • Hardware: MacBook Pro M3, 36GB RAM

  • OS: macOS Sonoma 14.5

  • MCP hosts tested: Claude Desktop 1.0.110, Cursor 0.40.4, VS Code Copilot Chat

  • All tests run locally (no network latency for the MCP server itself)
  • Baseline measurements (for reference):

  • Python 3.12 no-op function call: ~0.3ms

  • JSON serialization of 1KB payload: ~0.15ms
  • How to Reproduce These Benchmarks

    I am preparing to release the full benchmark code on GitHub with MIT license. When available, you will find it at: https://github.com/mcp-find/mcp-performance-benchmark

    Expected run steps (subject to final repo structure):

    git clone https://github.com/mcp-find/mcp-performance-benchmark.git
    cd mcp-performance-benchmark
    pip install -r requirements.txt
    python run_benchmark.py --transport=stdio --iterations=1000

    Expected output sample:

    Transport: stdio
    Iterations: 1000
    Warmup: 50
    P50: 145ms
    P95: 180ms
    P99: 210ms
    First-call overhead: 127ms

    Note: These are illustrative numbers based on my testing environment. Your output will differ.

    Transport Protocol Benchmarks: Stdio vs HTTP/SSE vs WebSocket

    I tested three transport mechanisms with 1000 sequential calls each.

    Stdio Transport (Default for Claude Desktop)

    Observed in my tests: P50=145ms, P95=180ms, P99=210ms (MacBook Pro M3, stdio, 1000 calls)

    Note: The first call in any session adds 80-150ms of overhead due to child process spawning. This is a one-time cost per session, not per call.

    Why the overhead exists: Stdio transport spawns a child process for the MCP server. Communication uses OS pipe buffers (64KB on Linux), which can require multiple read() syscalls for large responses.

    HTTP/SSE Transport

    Observed in my tests: P50=108ms, P95=130ms, P99=155ms (same hardware, HTTP/SSE)

    This is approximately 30ms better than stdio on median. The improvement comes from avoiding the process spawn overhead (persistent connection) and more efficient message framing.

    WebSocket Transport

    Observed in my tests: P50=102ms, P95=120ms, P99=140ms (same hardware, WebSocket)

    WebSocket had the lowest P99 latency in my tests, but the difference from HTTP/SSE is marginal for most applications.

    Note: WebSocket adds a one-time handshake overhead (5-10ms). For single-call interactions, this can offset WebSocket's per-call advantage.

    Serialization: JSON vs MessagePack

    MCP uses JSON by default. I tested MessagePack as an alternative.

    In my tests: MessagePack reduced serialization time by approximately 40% for payloads over 10KB. For typical tool payloads (100-1000 bytes), the difference was negligible (approximately 0.1ms).

    Recommendation: Unless you are building a high-throughput system with large payloads (over 10KB per response), JSON is sufficient. The complexity savings outweigh the marginal performance gain.

    Caching: What I Observed

    I tested a caching layer in front of the medium-latency tools with these parameters:

  • Cache key: hash of tool name + serialized arguments
  • TTL: 300 seconds
  • Simulated access pattern: 80% repeated queries
  • What I measured:

  • Without cache (medium-latency tools): average ~2100ms per call

  • With 80% cache hit rate: average ~431ms per call
  • Important caveat: These numbers reflect my specific test conditions—simulated API latency of 50ms and a specific query repetition pattern. In a production environment with real APIs and varied queries, your cache hit rate will likely differ. A 60% hit rate is more realistic for many production workloads.

    In my specific test scenario, the 80% hit rate came from a workload where the same queries were repeated frequently. Your mileage will vary.

    These Benchmarks Apply to Some Things and Not to Others

    Where these numbers are applicable:

  • Local machine MCP servers (no network latency to the server)
  • I/O-bound tools (database queries, HTTP API calls)
  • Cached scenarios with repeated queries
  • Desktop MCP hosts (Claude Desktop, Cursor)
  • Where these numbers do not apply:

  • Production environments with network latency between client and server
  • Real-time applications requiring sub-100ms response times
  • Workloads with low cache hit rates (below 50%)
  • Browser-based MCP clients (different transport)
  • Server-to-server MCP deployments at scale
  • These are relative comparisons, not absolute performance guarantees. Treat them as directional guidance for architectural decisions, not as vendor-neutral benchmarks.

    Practical Recommendations (Adjusted for Real-World Use)

    Based on my testing, here is what I found most impactful, in approximate order:

  • Caching — In my test with 80% hit rate, this gave approximately 5x improvement. Even with more realistic 50-60% hit rates, caching typically gives 2-3x improvement. The actual benefit depends heavily on your query repetition patterns.
  • Connection pooling — If your MCP server makes HTTP requests to external APIs, connection pooling saved approximately 30-100ms per uncached call in my tests. This assumes HTTPS/TLS connections. Without pooling, every cache miss pays the TLS handshake cost.
  • Transport choice — Switching from stdio to HTTP/SSE gave approximately 20-30% improvement in my tests. For desktop apps with multiple tool calls per session, stdio is still reasonable. For web-based or latency-sensitive deployments, consider HTTP/SSE.
  • Important: Profile your specific workload before optimizing. If your bottleneck is upstream API latency (not MCP overhead), transport optimization will not help.

    First-Call Overhead: What to Expect

    The first tool call over stdio in any session adds approximately 80-150ms on MacBook Pro M3 hardware. This is the cost of spawning the child process.

    For short-lived interactions (single queries, CLI tools), this overhead can dominate total latency. For interactive sessions with multiple tool calls, the overhead amortizes and becomes negligible.

    Connection Pooling Implementation

    If your MCP server makes HTTP requests, connection pooling is straightforward to implement:

    import httpx

    class HTTPClient:
    _client = None

    @classmethod
    def get(cls) -> httpx.Client:
    if cls._client is None:
    cls._client = httpx.Client(
    timeout=30.0,
    limits=httpx.Limits(max_connections=100, max_keepalive_connections=20)
    )
    return cls._client

    In my tests, combining connection pooling with 80% cache hit rate reduced median latency for HTTP-based tools from ~2100ms to ~130ms. With 50% cache hit rate, expect approximately 800-1000ms median.

    Summary

    These benchmarks represent my specific test conditions on MacBook Pro M3 hardware. They should be used as directional guidance for architectural decisions, not as universal performance claims.

    If you need reproducible benchmarks for your environment, I recommend running the benchmark code yourself once it is available on GitHub. Your hardware, network conditions, and workload patterns will produce different numbers.

    Focus on profiling your actual workload before optimizing. If your bottleneck is upstream API latency, transport optimization will not meaningfully help.

    Related Tools

  • [FastMCP](/tools/fastmcp) — High-performance MCP server framework. Used as the baseline in our transport benchmark comparisons.
  • [Official MCP Servers](/tools/servers) — Browse all Model Context Protocol servers. The complete list of tools tested in our ecosystem analysis.
  • LL

    Lee Li

    Independent Developer · MCP Enthusiast

    Building and breaking things with AI tools since 2023. MCP Find started as a personal project to track the rapidly evolving MCP ecosystem. Based in Hong Kong.

    info@mcp-find.org📍 Sai Kung, Kowloon, Hong Kong

    Sponsored