Comparison

How Tenure compares

Name: precisionMemBench
Creator: Tenure
License: https://opensource.org/licenses/MIT

Reproducible benchmark results from PrecisionMemBench. 89 cases measuring retrieval precision independently of any generative model. Run it yourself on HuggingFace.

Tenure precision

1.00

89/89 cases passed

Best competitor

0.43

Supermemory — 44/77 non-session

Median competitor

0.06

10-18 irrelevant beliefs per query

Tenure latency

<15ms

p50: 9.77ms

Choose a comparison

Tenure vs. Mem0

Precision: 0.06 · Active passes: 0/43

Cloud-first memory SDK. Good extraction, broken retrieval. Recall of 0.99 means everything comes back, including 16 irrelevant beliefs per query.

Tenure vs. Zep

Precision: 0.09 · Active passes: 0/43

Temporal knowledge graph. 897-second ingestion across 35 beliefs. Multi-container deployment. Drift score 0.89 on re-entry turns.

Tenure vs. Hindsight

Precision: 0.06 · Active passes: 0/43

MCP-based agent memory from Vectorize. Cross-encoder reranker adds 672ms mean latency without improving precision. Drift score: 0.93.

Tenure vs. Supermemory

Precision: 0.43 · Active passes: 17/43

The best-performing competitor. Memory graph with dynamic dreaming. Still returns 2-3 irrelevant beliefs per correct one on average.

Tenure vs. Agentmemory

Precision: 0.17 · Active passes: 0/43

Zero-database memory runtime for coding agents. Triple-stream retrieval (BM25 + vector + graph). High recall, low precision. Drift score: 0.81.

Tenure vs. GBrain

Precision: 0.14 · Active passes: 5/43

Garry Tan's opinionated agent brain. Hybrid search + knowledge graph + synthesis layer. Precision limited by uncapped retrieval default.

All results from PrecisionMemBench (arXiv:2605.11325). Dataset on HuggingFace. Live leaderboard on HuggingFace Spaces. Run it yourself: npm run test:eval