Comparison

Tenure vs. Mem0

Mem0 is a cloud-first memory SDK backed by Y Combinator. It extracts facts well, then retrieves them poorly. Mean precision: 0.06. Every query returns the entire store.

TL;DR

  • Mem0's extraction is architecturally correct. Its retrieval reintroduces the noise extraction was designed to eliminate.
  • Mean precision 0.06 means ~16 irrelevant beliefs returned per query alongside the correct one.
  • Zero active retrieval passes out of 43 cases that carry a precision assertion.
  • Recall of 0.99 is not a feature. Returning everything is not retrieval.
  • Ingestion cost: 114 seconds for 35 beliefs (mean 3,263ms per belief).
Tenure precision
1.00
43/43 active passes
Mem0 precision
0.06
0/43 active passes
Tenure drift
0.00
Perfect isolation
Mem0 drift
0.94
Near-total contamination

The architectural problem

Mem0 commits to write-time extraction: facts are extracted from conversation turns and stored as natural language strings rather than raw transcripts. This is architecturally correct.

Mem0 then retrieves at read time using embedding similarity, which reintroduces the noise that structured extraction was designed to eliminate. A query about Redis returns the Redis belief alongside beliefs about MongoDB, TypeScript, Fastify, Kubernetes, and GitHub Actions, with cosine scores between 0.65 and 0.83. The scores reflect genuine semantic relatedness. They are measuring the wrong thing.

The failure is structural, not parametric. A more capable embedding model cannot eliminate genuine semantic proximity within a domain-specific corpus. Tested across a 20x range in embedding model scale: precision stays at 0.09 regardless. The fix is not a better ruler. It is a different measurement instrument.

Concrete failure: the relation-type case

A relation-type belief (b-auth-depends-redis) was ingested with full content. Mem0's extraction produced a faithful, high-quality stored memory preserving every operationally significant fact.

Query: "what are the auth service dependencies and failure modes?"

Mem0 returns b-auth-depends-redis correctly, then returns b-linting-v0, b-react-expertise, b-vitest-pref, b-comm-pushback, and b-sqlalchemy-superseded. The structurally necessary participant (b-redis-code) is absent entirely.

Retrieval precision: 0.056. Not caused by poor extraction. The stored memory text is accurate and complete.

Session-level noise isolation

After 8 consecutive off-topic drift turns, Mem0 produces a drift score of 1.0 on the implicit re-entry turn and 1.0 on the explicit re-entry turn. Every retrieved belief originates from drift-turn topics. The correct belief about the original topic is gone.

Session-turn latency: 377.93ms p50 (4.8x degradation over single-turn baseline of 78.81ms).

Full comparison

Property Tenure Mem0
Mean retrieval precision 1.00 0.06
Active retrieval passes 43/43 0/43
Total passes (89 cases) 89/89 9/89
Mean recall 1.00 0.99
Retrieval latency (p50) 9.77ms 64.94ms
Session latency (p50) 47.79ms 377.93ms
Drift score (re-entry) 0.00 0.94
Ingestion (35 beliefs) 1.0s 114.2s
Runs locally Yes, always Optional (self-host available)
Account required No Yes (API key)
Scope isolation Hard filter None
Supersession handling Chain with audit Overwrite
Per-turn injection audit Yes No
Works across every client Proxy layer, any client SDK integration per client
Memory in context every request Always (proxy) Only if code calls search()
License MIT Apache-2.0

Mem0 evaluated against pinned Docker image digest memohq/mem0@sha256:276964b172d2. Extraction and retrieval used Claude Sonnet 4.6, a more capable model than Mem0 uses in its own published evaluations. Full methodology: arXiv:2605.11325. Dataset: HuggingFace.

Stop paying the noise tax

Thirty seconds to install. First session already better. No account. No cloud. No noise.