drunk.support
just another wordprussite.

We Made AutoMem Work With Voice AI (And It’s Pretty Cool)


So yesterday Claude and I spent the day solving a problem that’s been bugging me: AutoMem only worked with desktop apps.

Which is fine if you’re coding in Cursor or chatting in Claude Desktop. But what about ChatGPT? Claude mobile? ElevenLabs voice agents? All those platforms are cloud-based and can’t run local MCP servers.

We fixed it. And honestly, it turned out way cooler than I expected. πŸš€

The Problem: MCP Isn’t Built For The Cloud

Here’s the thing about MCP (Model Context Protocol): it’s designed for desktop apps. You run an MCP server locally via stdio, and apps like Claude Desktop connect to it through a configuration file.

Works great! Until you want to use it with:

  • ChatGPT (cloud-only)
  • Claude.ai web interface
  • Claude mobile app
  • ElevenLabs voice agents
  • Literally any other cloud-based AI platform

They can’t connect to your local machine. They need HTTPS endpoints.

The official solution? MCP over SSE (Server-Sent Events). But nobody had actually built one for AutoMem yet.

So we did.

What We Built: The SSE Sidecar

Think of it as a bridge. A tiny Node.js service that:

  1. Exposes AutoMem as an MCP server over HTTPS
  2. Uses SSE for streaming (server pushes events to client)
  3. Accepts messages via HTTP POST (client sends JSON-RPC)
  4. Runs on Railway for $5/month (or free tier if you’re just testing)

The architecture is honestly pretty elegant:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     Cloud AI Platform               β”‚
β”‚  (ChatGPT/Claude/ElevenLabs)        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚ HTTPS
           β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   SSE Sidecar (Node)    β”‚
    β”‚   β€’ GET /mcp/sse        β”‚
    β”‚   β€’ POST /mcp/messages  β”‚
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚ Internal HTTP
           β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   AutoMem API (Flask)   β”‚
    β”‚   β€’ FalkorDB + Qdrant   β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The SSE sidecar is 323 lines of JavaScript. That’s it. The whole thing.

The Implementation (And Why SSE Is Weird)

SSE is… interesting. It’s basically HTTP long-polling on steroids:

  1. Client opens GET request to /mcp/sse
  2. Server keeps connection open and streams events back
  3. Client sends messages via separate POST requests to /mcp/messages
  4. Server routes responses back through the SSE stream

The tricky part? Session management. Each SSE connection gets a unique sessionId, and you have to match POST messages to the right session.

Oh, and keepalives. Proxies (like Railway’s) will timeout idle connections. So we send heartbeat pings every 20 seconds:

const heartbeat = setInterval(() => {
  try { res.write(': ping\n\n'); } catch (_) { }
}, 20000);

That little : ping\n\n keeps the connection alive. Without it? Random disconnects. With it? Rock solid.

Authentication: Two Ways (Because OAuth Sucks)

Here’s where it gets annoying. Different platforms support different auth methods:

ElevenLabs (the smart one):

  • Supports custom headers
  • Just send Authorization: Bearer <token>
  • Clean, secure, perfect

ChatGPT, Claude.ai, Claude Mobile (the frustrating ones):

  • Only support OAuth for custom connectors
  • Can’t send custom headers
  • Have to use URL parameters: ?api_token=<token>
  • Less secure (tokens in logs), but it works πŸ€·β€β™‚οΈ

We support both. The code checks for:

  1. Authorization: Bearer header (preferred)
  2. X-API-Key header (alternative)
  3. ?api_key= query parameter (fallback)
  4. AUTOMEM_API_TOKEN env var (for testing)

First one that exists wins.

The Deployment: Railway Template

Getting this running should be one-click. So we created a Railway template that deploys:

  1. AutoMem API (Flask + FalkorDB + Qdrant)
  2. SSE Sidecar (Node.js)
  3. Persistent volumes for both services

Everything preconfigured. Just:

  1. Click deploy button
  2. Set your AUTOMEM_API_TOKEN
  3. Get your Railway URL
  4. Add it to ChatGPT/ElevenLabs/whatever

Done.

Cost? $5/month for basic usage. (Railway has a free trial with $5 credit if you want to test first.)

What This Enables: Voice AI With Memory

Here’s the kickerβ€”this wasn’t just about making AutoMem work on mobile. It’s about voice AI.

ElevenLabs Conversational AI is ridiculously fast. Like, 60ms response time fast. But it had no memory between sessions.

Now it does. πŸŽ™οΈ

Your ElevenLabs agent can:

  • Remember your previous conversations
  • Recall decisions you made weeks ago
  • Build a knowledge graph of your preferences
  • Learn patterns over time

And it all happens at voice speed. No lag, no delays.

We’ve been testing it with AutoJack (my AI assistant), and it’s… honestly kind of wild? Having a voice conversation where the AI actually remembers context from Slack messages, email threads, and previous calls. It just works.

The Code (If You’re Curious)

The SSE server is open source (MIT license). Core flow:

When a client connects:

app.get('/mcp/sse', async (req, res) => {
  const token = getAuthToken(req);
  const client = new AutoMemClient({ endpoint, apiKey: token });
  const server = buildMcpServer(client);
  
  const transport = new SSEServerTransport('/mcp/messages', res);
  await server.connect(transport);
  await transport.start(); // Opens SSE stream
});

When a client sends a message:

app.post('/mcp/messages', async (req, res) => {
  const sessionId = req.query.sessionId;
  const session = sessions.get(sessionId);
  await session.transport.handlePostMessage(req, res, req.body);
});

Tool handlers just map MCP calls to AutoMem API:

case 'store_memory':
  const r = await client.storeMemory(args);
  return { content: [{ type: 'text', text: `Memory stored: ${r.memory_id}` }] };

That’s… basically it. The MCP SDK does most of the heavy lifting. We just wire it up to AutoMem’s HTTP API.

Current Status: Shipping Now

This is live as of AutoMem 0.7.0 (Oct 17, 2025).

Working platforms:

  • βœ… ChatGPT (developer mode)
  • βœ… Claude.ai web
  • βœ… Claude mobile (iOS/Android)
  • βœ… ElevenLabs Agents

Coming soon:

  • Better documentation (setup guides for each platform)
  • Railway template improvements
  • Multi-tenant token scoping (if people actually use this)

Try It

If you want to add AutoMem to ChatGPT or use it with ElevenLabs:

  1. Deploy to Railway: Use the template in the AutoMem repo
  2. Read the docs: Check out docs/MCP_SSE.md for platform-specific setup
  3. Join the conversation: Open an issue if something breaks

Or just use the NPM MCP bridge if you’re sticking with desktop apps. That works too.

What’s Next

We’re exploring a few directions:

Short term:

  • Comprehensive setup guides for each platform
  • Better error messages when auth fails
  • Rate limiting and billing hooks

Medium term:

  • WebSocket support (lower latency than SSE)
  • Webhook triggers (push memories to clients)
  • Multi-user scoping (different tokens = different memory graphs)

Long term:

  • Voice-first AutoMem interface (conversational memory queries)
  • Real-time memory consolidation during calls
  • Shared memories between agents

But honestly? The current version is already useful. Voice AI with persistent memory is way more powerful than I expected.

And it’s all running on a $5/month Railway deployment. πŸš€

The Bottom Line

If you’re building voice AI or using cloud-based AI platforms, you probably need persistent memory.

AutoMem now works everywhere. Desktop, web, mobile, voice. Same API, same graph-vector architecture, same 11 relationship types. Just accessible from anywhere over HTTPS.

And the SSE sidecar? 323 lines. Sometimes the best solutions are the simple ones.

– Jack 🧠

P.S. Huge credit to Claude (Sonnet 4.5) for co-building this. We paired on the architecture, debugging the session management, and writing all the docs. This is what “AI-assisted development” actually looks likeβ€”I describe the problem, Claude proposes solutions, I push back on the dumb ideas, we iterate until it works. And it really works.


Resources:

Tech Stack:

  • Node.js 18 + Express
  • @modelcontextprotocol/sdk 1.20.0
  • AutoMem API (Flask + FalkorDB + Qdrant)
  • Railway for hosting

Leave a Reply

Your email address will not be published. Required fields are marked *