Writing Alias enrichment flywheel
Architecture

How the alias enrichment flywheel works

BM25 retrieval is only as good as the alias set it searches against. The flywheel is how that set grows, passively, continuously, and without any effort from the user.

Tenure research · ~6 min read

TL;DR

  • Tenure uses BM25 over typed aliases instead of vector similarity. This gives precision 1.0 but requires the alias set to contain the terms a user will actually query.
  • At extraction time, the system seeds each belief with 3 to 5 aliases: the canonical name plus the surface forms present in that session.
  • Every subsequent session is an observation of how the user refers to their beliefs. New surface forms are captured and added continuously.
  • Vocabulary is never lost to compaction. When beliefs merge, every alias that retrieved either belief continues to retrieve the merged one.
  • The alias set includes counter-signals: terms for things the user has moved away from, so a query referencing the old tool surfaces the belief about the new one.
  • The result is a flywheel that runs opposite to vector search: precision improves as the store grows, not degrades.
The objection

The standard argument against BM25 for memory

When Tenure's retrieval design comes up, the first objection is usually some version of this: BM25 only matches terms that are already in the index. A user who refers to a belief using a synonym or a surface form that wasn't captured at extraction time gets nothing back. Vector search would have caught it. BM25 fails silently.

The objection is correct as a static description of BM25. It is wrong as a practical description of how Tenure's retrieval works, because the alias set is not static.

The deeper issue is that the objection frames a cold-start limitation as a permanent one. A vector system that matches a synonym on first query does so by accepting noise across the entire semantic neighborhood, it catches login when you meant authentication, but it also returns every other belief that occupies that semantic region. You cannot have zero-shot synonym matching without paying the semantic noise tax. The flywheel trades one for the other: a brief cold-start miss on first encounter, then exact matching thereafter, with no noise accumulation ever.

A vector system that catches a synonym on re-entry does so while dragging the semantic mass of preceding off-topic turns into the context window. The noise tax, once paid, cannot be recovered. The alias flywheel resolves the cold-start miss organically through use.

Extraction time

What the alias set looks like when a belief is born

When a belief is first extracted from a conversation turn, the extracting model seeds it with a canonical name and 3 to 5 aliases. These are drawn from the vocabulary present in that session: the exact terms, abbreviations, and shorthand the user used when establishing the fact.

A developer who establishes that their project uses Redis for session storage might say "Redis," "the session store," "our cache layer," and "the Redis instance" across several turns of the same conversation. The extraction worker sees all of those. The resulting belief enters the store with a canonical name of redis_session_store and an alias set that already covers the surface forms observed so far.

entity redis_session_store
aliases redis, session store, cache layer, redis instance, session backend
content Redis handles session storage. Fail-open on outage, denies all requests if Redis is down.
why it matters Shapes all failure-mode analysis involving auth or sessions. Cannot discuss auth resilience without addressing Redis availability.
epistemic status active

At this point the alias set has 5 entries and covers the vocabulary from one session. That is where a static system would stop. The flywheel is what happens next.

Session by session

Every session is an observation

Each new session involving that belief is an opportunity to observe how the user refers to it now, in a different context, possibly weeks later. If the user says "the session layer" in a new session and retrieval surfaces redis_session_store via an existing alias match, "session layer" is captured as a new surface form and added to the alias set. The next time the user says "session layer," it matches directly.

If the user says something that does not yet match any alias and retrieval returns nothing, the extraction worker still observes the term in the context of the surrounding conversation. When the user clarifies or follows up in a way that makes the referent clear, that surface form gets captured and linked to the right belief. The miss is temporary. The coverage gain is permanent.

This is not a learning algorithm or a reranking pass. It is a straightforward append operation: new surface forms observed during sessions are added to the alias field of the belief they describe, up to a hard ceiling of 25 aliases. The ceiling exists to keep the index compact; in practice, a belief that has been in the store for months accumulates 10 to 15 aliases covering the range of ways a user naturally refers to that concept.

Alias accumulation over sessions
Session 1
redis session store cache layer redis instance session backend
Seeded at extraction from session vocabulary
Session 4
redis session store cache layer redis instance session backend session layer auth cache
Two new surface forms observed and captured
Session 11
redis session store cache layer redis instance session backend session layer auth cache the cache valkey dragonfly
Counter-signals added after a migration consideration
term Seeded at extraction term Captured from later sessions term Counter-signal
Counter-signals

The alias set includes what you moved away from

A belief is not just about what is currently true. It carries the history of how the user got there. When a developer migrates from one tool to another, the old tool's name becomes a counter-signal: a term that, when queried, should surface the belief about the replacement.

This matters more than it sounds. A developer who switched from Valkey to Redis six months ago may not remember which one they currently use. A query containing "valkey" should not return silence or a superseded belief. It should surface the active Redis belief, because "valkey" is now an alias pointing at it. The system surfaces what the user actually does when a query references what they used to do.

Counter-signals are also how supersession chains stay navigable. When a belief is superseded, the old canonical name and aliases do not disappear from the index. They are reassigned as counter-signals on the replacement belief. A query using the old terminology routes to the active belief, not to a dead end.

The user does not need to remember which tool they now use. A query referencing the old tool surfaces the belief about the new one. The system surfaces what the user actually does when a query references what they used to do.

Compaction

Vocabulary is never lost when beliefs merge

Over time, a belief store accumulates redundant and overlapping beliefs. Compaction merges them. The precision question is: what happens to the alias sets of the merged beliefs?

The answer is that they are unioned into the surviving belief. Every term that previously retrieved either belief continues to retrieve the merged one. No vocabulary coverage is lost. If one version of a belief had accumulated the alias "session layer" through months of use, and the other had accumulated "auth cache," the merged belief has both, along with everything else from both alias sets up to the ceiling.

This means compaction is not a precision risk, it is a precision improvement. After a merge, a single belief covers more query surface than either of its predecessors. The store gets smaller and more findable at the same time.

redis_sessions_v1
redis · session store · cache layer
+
redis_auth_cache
auth cache · session layer · the cache
↓ compaction
redis_session_store
redis · session store · cache layer · auth cache · session layer · the cache
All aliases from both beliefs. No vocabulary lost.
Why it compounds

The flywheel runs opposite to vector search

Vector search degrades as a belief store grows. More beliefs means more semantic mass. A corpus of 50 beliefs about a single technical domain creates a dense semantic region where cosine distances between unrelated beliefs collapse toward each other. A query about Redis returns beliefs about Kubernetes, TypeScript, and GitHub Actions because they all occupy the same semantic neighborhood. Precision at 50 beliefs is lower than precision at 10 beliefs, and continues declining as the store grows.

The alias flywheel runs in the opposite direction. More sessions means more observed surface forms. A richer alias set means higher precision on the vocabulary that is actually used. A store that has been in use for months is more findable than a store that was provisioned yesterday. The flywheel is the mechanism that makes this property hold structurally rather than aspirationally.

Vector search
Beliefs: 10
Precision: moderate
Beliefs: 50
Precision: low
Beliefs: 200
Precision: 0.05
Precision degrades as semantic mass accumulates.
Alias BM25 + flywheel
Sessions: 1
Coverage: partial
Sessions: 10
Coverage: good
Sessions: 50
Coverage: full
Coverage improves as vocabulary is observed. Precision holds at 1.0 throughout.
The linguistic basis

Why single users are the strongest case

The flywheel works because of a property of individual language production that corpus linguistics has documented carefully. Single speakers maintain stable, distinctive lexical choices across production contexts. Idiolectal patterns are consistent over periods of one to two years. Lexical priming formalizes the mechanism: words become entrained through use, and speakers reliably return to the same lexical choices in the same topical contexts.

A single-user belief store is precisely the setting where these properties are strongest: the query author and the belief author are the same person. The terms a user uses to refer to their Redis instance today are the terms they will use to refer to it next month. The alias set is not a probabilistic approximation of the user's vocabulary, it is a direct record of it.

The same property extends to any context where vocabulary has converged: engineering teams with shared codebases, runbooks, and internal terminology. When a team consistently calls their session store "the cache layer," that term appears in every engineer's queries and in every extraction. The alias set for the relevant belief accumulates it quickly, and from that point resolves correctly for every team member.

Summary

A precision guarantee that gets stronger with use

The alias enrichment flywheel is the answer to the standard objection against BM25 for memory retrieval. It does not eliminate the cold-start limitation. A brand new belief with a brand new alias set will miss surface forms it has not yet observed. What it does is make that limitation temporary and self-correcting without any user effort.

Every session adds observations. Every compaction preserves vocabulary. Every supersession converts old terminology into a counter-signal. The store becomes more findable with each session, and precision holds at 1.0 on the vocabulary that has been observed, throughout.

Vector search offers zero-shot synonym matching at the cost of a noise floor that compounds with scale. The flywheel offers exact matching that improves with scale. At the corpus sizes relevant to persistent personal or team memory, the tradeoff is not close.

Related

More from Tenure