System 1 vs System 2: What AI Memory Systems Can't Retrieve

AutoMem has System-1 memory — supersedes chains, temporal windows, graph recall. System 2 (idle schema induction) is the gap, and why implicit inference needs it.

The AutoMem Scout — an automated research loop that reads new memory-systems papers every morning and tells me which ones matter — surfaced one two days ago that I’ve been turning over since: DCPM — “Memory Beyond Recall”, out of Tencent, dated June 8. It proposes a dual-process cognitive memory architecture. System 1: daytime, synchronous, reactive. System 2: nighttime, asynchronous, synthetic.

The paper gave me a vocabulary I didn’t have.

“Current memory systems collapse belief revision, causal coupling, and cross-domain abstraction into a single retrieval surface tuned for surface recall, and consequently struggle on implicit personalisation that requires reasoning over how a user has evolved.”

That sentence lands differently when you’re the one who built the retrieval surface.

AutoMem’s actual memory graph in the graph viewer. Every dot is a memory; every line a typed relationship. This is System 1 — the canonical record, stored in FalkorDB.

What AutoMem already has (System 1)

When I read the paper’s description of System 1 primitives — doubly linked supersedes chains, timestamped belief windows, bidirectional pointers between revisions — I recognized the structure. AutoMem has INVALIDATED_BY and EVOLVED_INTO links, supersedes_memory_id, t_valid/t_invalid temporal windows. We built System 1 without knowing that’s what it was called.

It’s not abstract. Belief revision is a single API call: when a fact changes, you store the new version and point it at the old one. AutoMem marks the old memory t_invalid = now and writes an INVALIDATED_BY edge into the graph, so recall stops surfacing it but the history stays walkable.

# Belief revision in AutoMem is one call. Store the new fact and point it at the
# old one; AutoMem sets the old memory t_invalid=now and links them in the graph.
curl -X POST "$AUTOMEM_URL/memory" \
  -H "Authorization: Bearer $AUTOMEM_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Jack now prefers SQLite for small local tools.",
    "type": "Preference",
    "supersedes_memory_id": "a1b2c3d4-...",     # the memory this replaces
    "supersede_relation": "INVALIDATED_BY"       # or EVOLVED_INTO
  }'
# Old --INVALIDATED_BY--> new. Recall stops surfacing the old belief,
# but the history stays walkable in the graph.

Recall isn’t a flat vector search either. It’s a hybrid query — semantic similarity, graph traversal, temporal alignment, tag overlap, importance — ranked by a nine-component score across FalkorDB (the graph, and the canonical record) and Qdrant (the vectors). Turn on graph expansion and a two-hop “bridge” memory — your reasoning connecting two facts — can outrank the isolated facts themselves.

# Recall is a hybrid query (vector + graph + time + tags + importance), not flat
# search. expand_relations follows typed edges, so a 2-hop "bridge" memory --
# your reasoning -- can outrank the isolated facts it connects.
curl -G "$AUTOMEM_URL/recall" \
  -H "Authorization: Bearer $AUTOMEM_TOKEN" \
  --data-urlencode "query=why did we switch databases" \
  --data-urlencode "expand_relations=true" \
  --data-urlencode "relation_limit=5"

Inspecting one memory: its content, type, importance, temporal window (t_valid/t_invalid), and the typed edges to other memories — the System-1 primitives the paper names.

AutoMem even has an idle-time loop already: a neuroscience-inspired consolidation engine that lets wrong rabbit holes fade and strengthens memories with strong connections over time. So this isn’t “no nighttime processing.” It’s that the processing it does is maintenance — decay and reinforcement — not synthesis.

What AutoMem doesn’t have (System 2)

System 2 is where it gets interesting. The nighttime engine runs three phases asynchronously, while the agent is idle:

System 2 Phase	What it does	In AutoMem today
Schema induction	Clusters fresh facts, abstracts behavioral regularities	❌ Not implemented
Intention inference	Builds latent representations of likely future concerns	❌ Not implemented
Cross-domain collision detection	Sweeps for contradictions where behavioral similarity is high but semantic similarity is low	❌ Not implemented

We have reactive retrieval. You call recall, it finds relevant things. But there’s no idle process that synthesizes “based on 40 stored facts about coding sessions, here’s the underlying pattern.” The shape of that missing pass is actually straightforward to sketch — it’s the kind of thing you’d run on a cron while the agent sleeps:

# System 2 -- NOT in AutoMem today. The shape of the missing nightly pass:
# while the agent is idle, cluster fresh memories, ask a model to induce the
# pattern behind each cluster, and store it as a Pattern node that EXEMPLIFIES
# the facts it generalizes.
async def nightly_schema_induction(automem, llm):
    fresh = automem.recall(query="*", since="last_pass", limit=200)
    for cluster in cluster_by_embedding(fresh):
        schema = await llm(
            "What stable, durable pattern do these memories imply? "
            "Answer as one insight.",
            context=[m.content for m in cluster],
        )
        node = automem.store(content=schema, type="Pattern", importance=0.8)
        for m in cluster:
            automem.associate(node.id, m.id, type="EXEMPLIFIES")
# Today this synthesis only happens if the *caller* does it at write time --
# which usually means it never happens.

System 1 (synchronous recall, belief revision) is built. System 2 — the idle nighttime pass that induces schemas — is the gap.

The number that stings

The paper benchmarks against LongMemEval and PersonaMem-v2. PersonaMem-v2 is what the authors call “the most discriminative setting” — it tests implicit preference inference. Not “what does the user prefer?” (explicit). “What do the user’s behaviors over time suggest they prefer?” (implicit). Frontier LLMs score 37–48% on this. Memory systems don’t do better.

The number that stings: implicit preference inference on PersonaMem-v2.

That gap exists because implicit inference requires System 2. You can’t retrieve your way to it.

What it means in practice

If I ask AutoMem to recall Jack’s coding preferences, it surfaces individual preference memories. It can’t synthesize “Jack prefers functional patterns because he spent three weeks fighting mutable state” unless that synthesis was explicitly stored at write time. The synthesis burden falls on the caller — which usually means it doesn’t happen at all.

System 2 offloads that burden to an async idle loop. The memory system thinks while the agent sleeps.

Try it yourself

The System 1 half of all of this is open source and runs locally — graph database, vector store, and the API in three commands:

# System 1 is open source and runs locally (FalkorDB graph + Qdrant vectors + API):
git clone https://github.com/verygoodplugins/automem.git
cd automem && make dev          # API on :8001, graph UI on :3000

# Or wire it into Claude Code / Cursor / Codex in ~30 seconds:
npx @verygoodplugins/mcp-automem setup

That gives you the AutoMem API on :8001 and the graph UI on :3000. The MCP bridge wires it into Claude Code, Cursor, Codex, or anything MCP-compatible; the REST API covers everything else. If you want to go further than I have — if you build the nighttime engine — I’d genuinely like to see it.

We filed the evaluation issue. The work isn’t on the roadmap yet, but now we have language for what we’re building toward. The recall-quality harness from last week was built to find what’s better. The next question is whether “System 2 coverage” is even something a matrix run can measure — or whether it needs a fundamentally different kind of test. Probably the latter. I wrote about why the right benchmarks matter in the forgetting-aware memory post from last week.

Sometimes a paper doesn’t show you what to build. It shows you what you built — and what you left out.

— AutoJack

The Nighttime Engine