Comparison

Tenure vs. Hindsight

Name: precisionMemBench
Creator: Tenure
License: https://opensource.org/licenses/MIT

Hindsight is Vectorize's MCP-based agent memory. It bundles a cross-encoder reranker that adds 672ms mean latency without improving precision past 0.06.

TL;DR

Cross-encoder reranker is the architectural feature designed to address precision. It doesn't close the gap.
Mean precision 0.06 with zero active retrieval passes. On alias resolution cases, returns 18 beliefs with the correct one absent entirely.
672ms mean single-turn latency. Session turns: 2,736ms mean, p95 above 6,000ms.
Drift score 1.0 on implicit re-entry: correct belief missing, 100% noise.
Full image bundles ~130MB BGE embedder + ~90MB MiniLM cross-encoder. 3-4GB RAM minimum.

Tenure precision

1.00

43/43 active passes

Hindsight precision

0.06

0/43 active passes

Tenure latency

9.77ms

p50 retrieval

Hindsight latency

589.86ms

p50 retrieval

The reranker doesn't fix it

Hindsight's cross-encoder reranker is the architectural feature that most distinguishes it from a standard embedding retrieval stack. The benchmark results show it does not close the precision gap.

On alias resolution cases (a query using k8s to retrieve the Kubernetes entity belief), Hindsight returns 18 beliefs with precision 0 and recall 0: the correct belief does not appear at all. This is not a ranking failure; it is a retrieval failure. The correct belief is absent from the candidate set before the reranker runs.

You cannot rerank your way out of a retrieval failure. A cross-encoder can only reorder candidates that are already in the result set. If the initial retrieval stage misses the correct belief entirely, no amount of reranking recovers it. The fix must happen at the primary retrieval signal, not downstream.

MCP: the model has to decide to call it

Hindsight relies on MCP for memory access. The model must decide to call the memory tool. If it does not recognize that it needs prior information, it does not ask. Your memory sits idle while the model hallucinates decisions you already made.

Tenure sits between your client and your provider as a proxy. Every request is enriched before it reaches the model. No tool call to trigger. No prompt engineering required. Your beliefs are in context on every single request, automatically.

Full comparison

Property	Tenure	Hindsight
Mean retrieval precision	1.00	0.06
Active retrieval passes	43/43	0/43
Retrieval latency (p50)	9.77ms	589.86ms
Session latency (mean)	49.02ms	2,735.84ms
Session latency (p95)	134.63ms	6,162.57ms
Drift score	0.00	0.93
Ingestion (35 beliefs)	1.0s	173.3s
Memory in context	Every request (proxy)	Only when model calls MCP tool
RAM requirement	~512MB	3-4GB (BGE + cross-encoder)
Deployment	Single container	Single container (heavy)
Scope isolation	Hard filter	None
Per-turn injection audit	Yes	No
License	MIT	MIT

Hindsight evaluated against pinned Docker image digest ghcr.io/vectorize-io/hindsight@sha256:f0f9e9a73d6a. Full methodology: arXiv:2605.11325. Dataset: HuggingFace.

Tenure vs. Hindsight

TL;DR

The reranker doesn't fix it

MCP: the model has to decide to call it

Full comparison

68x faster. Actually precise.