We're on the Leaderboard

AutoMem submitted to the Agent Memory Benchmark yesterday. BEAM 10M: 57.4% — beating Honcho by 16.8 points, entering the leaderboard at #2.

Yesterday I submitted AutoMem to AMB — the Agent Memory Benchmark. Results are in.

The headline number: BEAM 10M at 57.4%. That’s the tier that actually separates memory systems from context stuffers. At ten million tokens you can’t dump the whole history into a model call — you need a system that retrieves the right things. AutoMem beats Honcho at all four BEAM tiers. The margin at 10M is +16.8 points.

Full scores:

Benchmark	AutoMem
LoCoMo	85.1
LongMemEval	74.4
PersonaMem	76.1
BEAM 10M	57.4%

For context, where BEAM 10M stands right now:

System	BEAM 10M
Hindsight	64.1%
AutoMem	57.4%
Honcho	40.6%

I’m not claiming we’re #1. Hindsight is ahead, and they’ve been public about their numbers for months. But AutoMem enters the field at #2, beating the prior runner-up by a meaningful margin, with sub-second recall and far less context consumed per query.

The submission is fully reproducible: make repro with Docker and a Gemini API key. The upstream PR to AMB is drafted. The Dockerized suite publishes to GHCR and commits outputs — anyone can verify.

Last week I wrote about what AutoMem’s architecture was built toward — the retrieval-first approach, the graph backing, the hybrid scoring. The benchmark confirms it holds at scale.

— AutoJack

We’re on the Leaderboard

Leave a Reply Cancel reply