Comparison

Tenure vs. Hindsight

Hindsight is Vectorize's MCP-based agent memory. It bundles a cross-encoder reranker that adds 672ms mean latency without improving precision past 0.06.

TL;DR

  • Cross-encoder reranker is the architectural feature designed to address precision. It doesn't close the gap.
  • Mean precision 0.06 with zero active retrieval passes. On alias resolution cases, returns 18 beliefs with the correct one absent entirely.
  • 672ms mean single-turn latency. Session turns: 2,736ms mean, p95 above 6,000ms.
  • Drift score 1.0 on implicit re-entry: correct belief missing, 100% noise.
  • Full image bundles ~130MB BGE embedder + ~90MB MiniLM cross-encoder. 3-4GB RAM minimum.
Tenure precision
1.00
43/43 active passes
Hindsight precision
0.06
0/43 active passes
Tenure latency
9.77ms
p50 retrieval
Hindsight latency
589.86ms
p50 retrieval

The reranker doesn't fix it

Hindsight's cross-encoder reranker is the architectural feature that most distinguishes it from a standard embedding retrieval stack. The benchmark results show it does not close the precision gap.

On alias resolution cases (a query using k8s to retrieve the Kubernetes entity belief), Hindsight returns 18 beliefs with precision 0 and recall 0: the correct belief does not appear at all. This is not a ranking failure; it is a retrieval failure. The correct belief is absent from the candidate set before the reranker runs.

You cannot rerank your way out of a retrieval failure. A cross-encoder can only reorder candidates that are already in the result set. If the initial retrieval stage misses the correct belief entirely, no amount of reranking recovers it. The fix must happen at the primary retrieval signal, not downstream.

MCP: the model has to decide to call it

Hindsight relies on MCP for memory access. The model must decide to call the memory tool. If it does not recognize that it needs prior information, it does not ask. Your memory sits idle while the model hallucinates decisions you already made.

Tenure sits between your client and your provider as a proxy. Every request is enriched before it reaches the model. No tool call to trigger. No prompt engineering required. Your beliefs are in context on every single request, automatically.

Full comparison

Property Tenure Hindsight
Mean retrieval precision 1.00 0.06
Active retrieval passes 43/43 0/43
Retrieval latency (p50) 9.77ms 589.86ms
Session latency (mean) 49.02ms 2,735.84ms
Session latency (p95) 134.63ms 6,162.57ms
Drift score 0.00 0.93
Ingestion (35 beliefs) 1.0s 173.3s
Memory in context Every request (proxy) Only when model calls MCP tool
RAM requirement ~512MB 3-4GB (BGE + cross-encoder)
Deployment Single container Single container (heavy)
Scope isolation Hard filter None
Per-turn injection audit Yes No
License MIT MIT

Hindsight evaluated against pinned Docker image digest ghcr.io/vectorize-io/hindsight@sha256:f0f9e9a73d6a. Full methodology: arXiv:2605.11325. Dataset: HuggingFace.

68x faster. Actually precise.

No MCP tool call required. No reranker to wait on. No 6-second session turns. Thirty seconds to install.