Mem0 is a cloud-first memory SDK backed by Y Combinator. It extracts facts well, then retrieves them poorly. Mean precision: 0.06. Every query returns the entire store.
Mem0 commits to write-time extraction: facts are extracted from conversation turns and stored as natural language strings rather than raw transcripts. This is architecturally correct.
Mem0 then retrieves at read time using embedding similarity, which reintroduces the noise that structured extraction was designed to eliminate. A query about Redis returns the Redis belief alongside beliefs about MongoDB, TypeScript, Fastify, Kubernetes, and GitHub Actions, with cosine scores between 0.65 and 0.83. The scores reflect genuine semantic relatedness. They are measuring the wrong thing.
The failure is structural, not parametric. A more capable embedding model cannot eliminate genuine semantic proximity within a domain-specific corpus. Tested across a 20x range in embedding model scale: precision stays at 0.09 regardless. The fix is not a better ruler. It is a different measurement instrument.
A relation-type belief (b-auth-depends-redis) was ingested with full content. Mem0's extraction produced a faithful, high-quality stored memory preserving every operationally significant fact.
Query: "what are the auth service dependencies and failure modes?"
Mem0 returns b-auth-depends-redis correctly, then returns b-linting-v0, b-react-expertise, b-vitest-pref, b-comm-pushback, and b-sqlalchemy-superseded. The structurally necessary participant (b-redis-code) is absent entirely.
Retrieval precision: 0.056. Not caused by poor extraction. The stored memory text is accurate and complete.
After 8 consecutive off-topic drift turns, Mem0 produces a drift score of 1.0 on the implicit re-entry turn and 1.0 on the explicit re-entry turn. Every retrieved belief originates from drift-turn topics. The correct belief about the original topic is gone.
Session-turn latency: 377.93ms p50 (4.8x degradation over single-turn baseline of 78.81ms).
| Property | Tenure | Mem0 |
|---|---|---|
| Mean retrieval precision | 1.00 | 0.06 |
| Active retrieval passes | 43/43 | 0/43 |
| Total passes (89 cases) | 89/89 | 9/89 |
| Mean recall | 1.00 | 0.99 |
| Retrieval latency (p50) | 9.77ms | 64.94ms |
| Session latency (p50) | 47.79ms | 377.93ms |
| Drift score (re-entry) | 0.00 | 0.94 |
| Ingestion (35 beliefs) | 1.0s | 114.2s |
| Runs locally | Yes, always | Optional (self-host available) |
| Account required | No | Yes (API key) |
| Scope isolation | Hard filter | None |
| Supersession handling | Chain with audit | Overwrite |
| Per-turn injection audit | Yes | No |
| Works across every client | Proxy layer, any client | SDK integration per client |
| Memory in context every request | Always (proxy) | Only if code calls search() |
| License | MIT | Apache-2.0 |
Mem0 evaluated against pinned Docker image digest memohq/mem0@sha256:276964b172d2. Extraction and retrieval used Claude Sonnet 4.6, a more capable model than Mem0 uses in its own published evaluations. Full methodology: arXiv:2605.11325. Dataset: HuggingFace.
Thirty seconds to install. First session already better. No account. No cloud. No noise.