You are looking for a place to swim in warm weather. You do not know the address. Instead, you walk into a city where the street layout encodes meaning. You ask a local: “Where can I swim somewhere warm?” She points you to a neighborhood. You walk there. The neighborhood has hotels with pools, beaches near tropical water, rivers in sunny climates. You explore. You find what you are looking for. The city map was built by observing how people navigate; the streets are organized by what people look for, not by what buildings contain.
Vector search works like navigating a conceptual city. Documents live at addresses defined by their meaning. Queries land you in neighborhoods of related concepts. You walk the neighborhood and collect candidates. The quality of the walk depends on whether the neighborhood actually contains what you want. If the city was built from tourist brochures but you are looking for where locals swim, the neighborhood will mislead you.
The embedding model determines the city geography. Your query gets embedded, placing it at a coordinate. Vector search finds the nearest stored document embeddings. “Nearest” is measured by distance in the high-dimensional space: cosine similarity, Euclidean distance, or dot product depending on the system and the data. The closest documents are returned as candidates. The distance measure determines what “near” means; choose it based on your data characteristics.
The critical thing about neighborhoods is that they are not designed, they are learned. The embedding model trains on text data and learns which concepts co-occur, which words appear in similar contexts, which phrases tend to cluster. The geography is a compressed summary of statistical patterns in the training data. This means the neighborhood structure reflects the data the model saw. A model trained on medical literature will have different neighborhoods than one trained on news articles, even for the same underlying concepts. The map was drawn from a specific perspective; it is not neutral.
This has a practical consequence that surprises teams regularly. A query about “banking” in a financial document corpus might return documents about riverbanks if the embedding model’s training data used “bank” in both contexts frequently. The neighborhood for financial institutions and the neighborhood for geographical features overlap in the model’s learned geography. You cannot design around neighborhood boundaries because you cannot see the boundaries directly. You can only test and observe where the neighborhoods fall. The map is interpretable only by navigation, not by inspection.
The neighborhood structure also shifts when embedding models update. A new embedding model version produces different coordinates for the same documents. A query that returned document X from the old model may return document Y from the new model, even if both documents are equally relevant. Index rebuilds when switching embedding models are necessary to maintain retrieval quality. The city changes when the map is redrawn.
ANN: Approximate Neighbourhoods
Exact nearest-neighbor search in high dimensions is expensive. As the number of documents grows, checking every document against the query becomes prohibitive. Approximate Nearest Neighbor (ANN) indexes sacrifice some recall for speed. HNSW, IVF, and PQ indexes are common approaches. They build data structures that prune the search space, exploring only a subset of the document space rather than the whole corpus. The approximation is usually acceptable because the embedding space is probabilistic anyway.
This diagram requires JavaScript.
Enable JavaScript in your browser to use this feature.
HNSW (Hierarchical Navigable Small World) builds a multi-layer graph structure. Upper layers are coarse navigable graphs; lower layers are finer. Search starts at the top and narrows down. This approach offers excellent query speed with good recall, and is widely used in production vector databases. The trade-off is memory usage: HNSW indexes consume significant RAM. The speed comes from pruning the search space early using coarse navigation.
IVF (Inverted File Index) clusters documents and searches within the most relevant clusters. It reduces the search space by focusing on clusters near the query vector. The recall-speed trade-off is tunable via the number of clusters searched. More clusters searched means better recall but slower queries. IVF is often combined with other indexing methods for hybrid performance.
PQ (Product Quantization) compresses vectors by splitting them into subvectors and quantizing each separately. This dramatically reduces memory usage and enables faster distance computations. The compression introduces quantization error that affects recall. PQ is often combined with IVF for a hybrid approach. The compression is lossy; the question is whether the loss matters for your retrieval quality.
The trade-off is probabilistic. ANN indexes are tuned to return most of the true nearest neighbors quickly, not all of them. For retrieval systems this is usually acceptable: if the embedding space is already probabilistic (neighborhood proximity is not exact relevance), perfect recall from the index is less important than fast retrieval. A retrieval system that returns the top 10 candidates in 50ms is more useful than one that returns the top 10 in 2000ms. Speed and recall are in tension; tune for your use case.
The recall rate matters, though. If your ANN index is configured aggressively for speed and only returns 70% of the true nearest neighbors, 30% of potentially relevant documents are never retrieved. For high-stakes retrieval tasks, measure your recall rate with an annotated test set. Most ANN implementations expose recall as a tuning parameter: you can push it higher at the cost of speed.
Failure Modes
Vector search fails when embedding neighborhoods do not match the query intent. A query about “banking” might return documents about riverbanks if the embedding model was trained on text where those contexts overlap. Jargon collisions are common in specialized domains. In law, “consideration” means something specific in contract law. In everyday usage, it is a more general term. A legal retrieval system using a general-purpose embedding model will have these collisions. The map was not drawn for this territory.
The approach also struggles with multi-lingual data unless the embedding model handles multiple languages. A query in English matching documents in German requires a multilingual embedding model. If your corpus contains multiple languages and your embedding model only handles one, you are losing access to large portions of your content. Multilingual embedding models typically have lower per-language quality than monolingual models, so this is a trade-off. A map that only shows English street names will not help you navigate a German city.
Exact matches are another gap. Semantic similarity does not guarantee keyword overlap. A document that uses different terminology for the same concept may match semantically, but queries that depend on specific phrasing will miss documents that use synonyms. “Heart attack” and “myocardial infarction” mean the same thing and an embedding model will cluster them. But if your query is a precise legal phrase that has only one correct formulation, embedding-only search may miss the exact match that keyword search would catch. The map connects synonyms but misses proper names.
Long-tail queries are sparse. If a query describes something unusual that does not appear often in the training data, the embedding model’s neighborhood for that concept may be poorly defined. The query lands in a sparse region of the vector space where neighbors are noisy rather than informative. Queries about niche topics, unusual combinations of concepts, or highly specialized jargon may retrieve irrelevant results. The map has no information for neighborhoods nobody visits.
Temporal drift is an underappreciated failure mode. If your corpus changes over time (new products, new terminology, new documents), the embedding space may drift. Documents added six months ago may not cluster with newer documents on the same topics if the embedding model was updated and the old index was not rebuilt. Monitor retrieval quality over time, not just at deployment. The map may be outdated even if the territory has not changed.
The Hybrid Approach
Most production retrieval systems combine vector search with keyword search. The keyword component catches exact matches and jargon collisions. The vector component catches semantic similarity that keyword matching misses. Results are merged, often with a reranking step that considers both scores. The hybrid leverages both navigation methods: the map and the street signs.
The reranking step is where the actual answer quality comes together. A simple merge of scores from two subsystems is often suboptimal. A learned reranker can weight the signals appropriately based on the query type. For factual queries with specific terminology, keyword signals dominate. For conceptual queries, vector signals dominate. A reranker learns this weighting from data.
Reciprocal Rank Fusion is a simple but effective merging strategy. Rank documents by score in each retrieval system, then combine ranks rather than scores. The combined rank is the sum of reciprocal ranks across systems. Documents that rank highly in either system rank well in the fusion. This is robust to different score scales across retrieval systems. It is simple to implement and often works well as a baseline.
Dense retrieval (embeddings) and sparse retrieval (BM25) complement each other. Dense excels at semantic matching; sparse excels at exact term matching. Neither dominates all query types. Hybrid systems get the best of both when properly fused. The map tells you which neighborhoods matter; the index tells you which exact addresses match.
Use vector search when you need semantic similarity matching at scale, when you are building a RAG retrieval layer, when traditional keyword search returns too much noise, when you have a corpus where meaning matters more than exact wording, and when your embedding model is well-suited to your domain. Consider alternatives when exact match is required (keyword search is better), when your domain has collision-prone jargon without a well-calibrated embedding model, when you need deterministic auditable retrieval paths, when the corpus is small enough that brute-force is acceptable, when explainability matters: you need to show why a document matched, and when you have multi-lingual content without a multilingual embedding model.
The walk through the neighborhood is faster than checking every address. Whether you find what you want depends on whether the neighborhood was built for your kind of searching, and whether the ANN index is configured to actually take you there. Build the map from your territory, not from someone else’s.