Building a Real MCP Integration: From API to Production in 4 Hours
A practical walkthrough of building a production MCP server: tool design, input schemas, error handling, auth, caching, and performance optimization.
Lee Li
Independent Developer · MCP Enthusiast
Building a Real MCP Integration: From API to Production in 4 Hours
Disclosure: This article describes a real internal API integration project that has been anonymized. The project took 4.5 hours, not exactly 4 hours as the title suggests. I am describing it as "4 hours" because that is approximately how long the MCP server implementation took after the API analysis was complete.
The full project—from first conversation about the requirement to production deployment—took approximately 3 days. This article focuses specifically on the MCP server implementation phase, which was 4.5 hours.
Before I discovered MCP, every integration between an AI assistant and an internal API was a bespoke project. I spent weeks building custom adapters, writing prompt engineering to teach the AI about my data schema, and then watching helplessly as the AI hallucinated responses. MCP solved this: define a protocol once, and any AI that speaks that protocol can use any tool.
This is the story of one real integration, including what worked and what did not.
Start by Mapping the API Surface Before Writing Any Code
The most common mistake is jumping straight into code. On this project, I spent the first two hours exclusively on API analysis: three internal endpoints, each with different authentication mechanisms, different rate limits, and different pagination strategies.
For the authentication endpoint: OAuth2 with a 1-hour token expiry. The analytics endpoint: separate API key. The export endpoint: signed URL. Rather than hiding this complexity behind a single auth layer, I exposed each auth mechanism as a separate configuration parameter. The MCP server receives pre-authenticated requests—the AI never handles auth directly.
Rate limits: 1000 requests per minute for the analytics API, with a burst limit of 100. Pagination was cursor-based, returning 100 results per page with a maximum of 10,000 total results.
Lesson from a similar project that failed: I worked on another API integration where the team did not map rate limits upfront. Their first production load test revealed rate limiting at 200 requests per minute—far below their expected traffic. They had to redesign the caching layer after the fact, adding 2 days to the project.
Designing Tools by User Goals, Not API Endpoints
The intuitive approach is one tool per API endpoint: get_events_7d, get_events_30d, get_events_90d. This mirrors the backend API structure but is a disaster from the AI's perspective.
When a user asks "show me last month's events," an AI using endpoint-per-function tools must guess which function to call. I designed one flexible tool:
@mcp.tool()
def query_events(
event_type: str,
start_date: str, # ISO format YYYY-MM-DD, not datetime
end_date: str,
limit: int = 100,
cursor: str = None
) -> dict:
"""
Query events from the analytics API with flexible date filtering.
Args:
event_type: Type of event to query (e.g., 'click', 'purchase', 'signup')
start_date: Start date in ISO format (YYYY-MM-DD)
end_date: End date in ISO format (YYYY-MM-DD)
limit: Maximum results to return (1-1000, default 100)
cursor: Pagination cursor from previous response, None for first page
"""
if not is_valid_iso_date(start_date):
raise ValueError(
"start_date must be ISO format YYYY-MM-DD. "
f"Received '{start_date}'. Try '2024-03-01' instead."
)
# ... rest of implementation
This single tool replaces three endpoint-specific functions. When the AI needs 7 days, it passes start_date and end_date. When it needs 90 days, it passes the 90-day range. The AI does not need to know which endpoint to call.
Why Real Projects May Take Longer Than 4 Hours
In my experience, the 4-hour estimate assumes:
In practice, expect these complications:
API authentication complexity (+1-3 hours): OAuth2 token refresh, API key rotation, signed URLs—each adds implementation time. One project I worked on required a custom token refresh mechanism that took 3 hours alone.
Cache hit rate below 80% (+1-2 hours of tuning): My initial cache hit rate was 45%. Queries were more varied than expected. I had to adjust cache key strategy (adding query parameters to the key) and TTL before reaching acceptable performance.
Error handling for edge cases (+2-4 hours): The happy path took 30 minutes. Handling rate limit 429s, 500s, network timeouts, malformed responses, and partial failures took another 4 hours.
Upstream API changes (+unpredictable): One project hit a silent API behavior change where the pagination cursor format changed without notice. Debugging why results were missing took an entire day.
Caching: What Actually Worked
The analytics API had P99 latency of 2 seconds. Without caching, every tool call hit the API. With a 300-second TTL cache, I achieved approximately 65% hit rate in production (not 80% as I initially expected—query variation was higher than my test scenarios).
Key lesson: The 80% figure from my test scenario did not translate to production. Test with realistic query patterns before assuming cache performance.
class Cache:
def __init__(self, ttl: int = 300):
self.cache: OrderedDict[str, tuple[Any, float]] = OrderedDict()
self.ttl = ttl
def get(self, key: str) -> Any | None:
if key not in self.cache:
return None
value, expiry = self.cache[key]
if time.time() > expiry:
del self.cache[key]
return None
self.cache.move_to_end(key)
return value
def set(self, key: str, value: Any) -> None:
self.cache[key] = (value, time.time() + self.ttl)
self.cache.move_to_end(key)
if len(self.cache) > 1000:
self.cache.popitem(last=False)
I chose 300 seconds (5 minutes) as the TTL because the analytics data updates every 5 minutes. Adjust based on your data freshness requirements.
A Project That Failed and What I Learned
A colleague attempted a similar MCP integration for their team's internal search API. They estimated 4 hours, similar to my experience. It took them 2 days.
What went wrong:
What they learned: MCP integration is 20% protocol, 80% API interface design. The tool names, descriptions, input schemas, and error messages determine whether the AI uses the tools correctly. They spent 2 hours on the MCP layer and 10 hours on API edge cases.
Practical Takeaways
The MCP protocol itself is simple. What matters is how you map your domain to tool abstractions. Budget your time accordingly.
Related Tools
Lee Li
Independent Developer · MCP Enthusiast
Building and breaking things with AI tools since 2023. MCP Find started as a personal project to track the rapidly evolving MCP ecosystem. Based in Hong Kong.