A treasure map says: “Start at the old oak. Go north three miles. Turn east. Follow the river for two miles. The cache is on the south bank, across from the big rock.” Each instruction tells you where to go next based on where you are. The map is not a description of the terrain. It is a sequence of steps through the terrain. To find the treasure, you follow the path.
Knowledge graph traversal works the same way. A KG represents facts as nodes and relationships. “Paris” is a node. “Is in” is a relationship. “France” is a node. To answer “What country is Paris in?”, you start at Paris, follow the “is in” relationship to France. To answer “What cities are in France?”, you start at France, follow the “has city” relationship to its children. The query is a path. The answer is the destination.
Why Graphs Represent Certain Knowledge Well
Some knowledge is inherently relational. The fact that “Alice owns a car” is not a property of Alice alone or of the car alone. It is a relationship between them. The fact that “a car requires fuel” is a relationship between the concept of a car and the concept of fuel. Knowledge graphs represent these relationships explicitly, making them traversable for queries that depend on connection patterns. The relationship is the primary unit of representation, not the entity.
Relational databases represent this kind of knowledge too, but KGs do so without requiring a predefined schema. Adding new relationship types does not require altering the structure. In a relational database, adding a new relationship type might require schema changes. In a knowledge graph, you add a new relationship type without changing existing structure. This flexibility is valuable in domains where relationships are discovered over time.
Consider a supply chain domain. The relationship “supplier A provides component B to facility C” involves three entities and a relationship that might not have been anticipated when the schema was designed. In a relational database, this might require adding tables and foreign keys. In a knowledge graph, you add nodes for the supplier, component, and facility if they do not exist, then add the relationship. The existing data is unaffected.
The explicit representation of relationships also makes graph traversal useful for questions about paths: “what is the chain of command between the CEO and a front-line employee?” or “what entities are connected to this person through three or fewer relationships?” These queries ask about paths through the relationship structure, not about attributes of individual entities. A database schema that normalizes entities well may not represent paths efficiently.
Graph Query Languages
KG traversal requires a query language that expresses paths and relationships. SPARQL is the standard for RDF graphs. Cypher is common for property graphs. Gremlin is another option. Each language has different expressiveness and performance characteristics. The query language determines what queries you can express and how efficiently the graph engine can execute them.
Expressiveness matters. Some queries that seem simple require sophisticated path patterns. “Find all people who have a path to this company within three hops” is a reachability query. “Find the shortest path between these two entities” is a shortest path query. Different query languages handle these differently. Some graphs support these queries natively. Others require multiple queries or post-processing.
Query optimization is graph-specific. Relational databases have query planners that optimize joins. Graph databases have their own optimizers that consider graph structure. A query that traverses many edges may be optimized differently than a query that traverses few edges. Understanding how your graph engine optimizes queries helps you write traversals that perform well.
The Traversal Cost Problem
Following a path through a knowledge graph takes time proportional to the number of hops. Deep traversals (many hops to reach an answer) are expensive. A two-hop query is fast. A five-hop query is slower. A query that must explore many paths to find the right one is slowest of all.
Large fan-out at intermediate nodes multiplies the search space. If “France” has relationships to thousands of cities, a query asking “what cities are in France” must traverse all of them or find an efficient way to limit the traversal. The graph structure determines query cost as much as the query itself.
Indexing and query planning help, but KG queries are fundamentally different from retrieval queries. A retrieval system finds documents similar to a query. A KG traversal follows explicit relationships step by step. The retrieval system can use approximate matching to find candidates. The graph traversal must find exact matches along relationship paths.
Consider a query asking for “all products that share a supplier with product X.” The traversal starts at X, finds its supplier, then finds all products that share that supplier. This is two hops and manageable. But a query asking for “all products that share a supplier with product X through any number of intermediate suppliers” requires traversing the entire supplier network. This can be expensive for large graphs with many suppliers and products.
Query optimization for knowledge graphs is a specialized skill. Understanding which indexes exist, how the query planner works, and how to structure queries to exploit those indexes affects performance dramatically. A poorly written traversal query can timeout where an equivalent well-written query returns in milliseconds.
Graph Construction
A knowledge graph is only as good as its construction. Building a KG from structured data is straightforward: the relationships exist in the source and can be converted to graph edges. The data is already in relational form; the conversion is mechanical. Building from unstructured text requires extraction: identifying entities, identifying relationships, converting natural language statements into graph triples. This is a harder problem.
Extraction quality determines graph quality. Missed entities mean missing nodes. Missed relationships mean missing edges. Wrongly extracted relationships mean wrong edges. All three errors compound in traversal, producing incomplete or incorrect answers. A traversal that cannot find the right answer because the edge was never created is a different failure mode than a traversal that finds the wrong answer.
Named entity recognition identifies mentions of people, places, organizations, and other significant concepts in text. Relation extraction identifies the relationships between these entities. Both tasks are imperfect, and errors propagate into the graph. A relation extractor that confuses “owns” with “operates” creates edges that represent the wrong relationship.
Human validation of extracted knowledge is expensive but improves reliability. Automated validation against known answers is cheaper but only catches errors in coverage areas you thought to test. Coverage gaps remain invisible. A graph that passes all automated validation tests may still be missing edges that were never tested.
Graph construction is not a one-time effort. Real-world domains evolve, new entities appear, new relationships emerge. A knowledge graph built last year may be incomplete this year. Keeping the graph current requires ongoing extraction and integration effort.
When Graphs and Retrieval Complement Each Other
Retrieval excels at finding content. Graphs excel at traversing relationships. The combination is powerful: use retrieval to find candidate documents, use graph traversal to verify or expand the answer through known relationships. Neither approach alone handles all queries well; together they cover more ground.
A legal research system might retrieve documents relevant to a case, then use a graph of legal precedents to identify which precedents support or contradict the retrieved positions. The retrieval finds relevant text. The graph finds related cases. The combination supports better legal reasoning than either alone.
A product recommendation system might retrieve customer reviews, then use a product relationship graph to find related products that the customer has not yet reviewed. The retrieval finds what the customer is interested in. The graph finds what is related to that interest. The combination produces better recommendations than either approach.
The hybrid approach requires integration engineering. The retrieval system and graph system must be connected in ways that let answers flow between them. This is more complex than either approach alone but can produce better results for domains where both content and relationships matter.
The integration can be loose or tight. Loose integration uses retrieval to find candidate documents, then uses graph traversal to filter or rank them. Tight integration embeds graph structure into the retrieval representation, so graph relationships influence which documents are retrieved. Tight integration is more powerful but more complex to build and maintain.
The Completeness Problem
A knowledge graph represents only the knowledge that was added to it. If the relationship between two entities was never extracted and added, traversal cannot find it. A KG is always incomplete relative to the full complexity of the domain. The knowledge graph is a map of the territory, and the map is always smaller than the territory.
This is fundamentally different from retrieval, where any document that might contain the answer can be retrieved. Retrieval can find information that was never explicitly encoded in the system. If the answer is in an unindexed document, retrieval fails. If the answer requires a relationship that was never graphed, graph traversal fails in a different way. Retrieval fails silently (the document was not found). Graph traversal fails completely (the relationship does not exist).
Knowing what your graph does not contain is as important as knowing what it does contain. Coverage analysis identifies gaps in the graph that limit traversal answers. If your graph covers only 60% of the entities in your domain, queries will fail or return partial answers 40% of the time. Coverage analysis tells you where to focus graph construction efforts.
Completeness is not binary. A graph that has 80% of entities and 40% of relationships is more useful than one with 60% of entities and 20% of relationships, even though both are incomplete. Knowing the coverage percentages for both entities and relationships helps prioritize construction effort.
Traversal Failure Modes
Graph traversal can fail in distinct ways. Missing node failure: the starting entity does not exist in the graph, so no traversal is possible. Missing edge failure: the relationship needed for traversal does not exist, even though both nodes exist. Explosion failure: the fan-out at an intermediate node is so large that traversal times out or runs out of memory.
Each failure mode requires different remediation. Missing nodes require more comprehensive entity extraction. Missing edges require better relationship extraction or different query construction. Explosion failures require query optimization or graph partitioning to reduce fan-out.
Understanding failure modes requires monitoring traversal outcomes. What percentage of traversals complete successfully? What percentage fail at each stage? Which entity types or relationship types fail most frequently? This diagnostic information guides graph improvement efforts.
Decision Rules
Use knowledge graphs when:
- Your domain involves explicit relationships that matter (ownership, causality, hierarchy, geography)
- Queries ask about paths through relationships (who owns what, what causes what)
- You need to answer questions about connections, not just content
- Data comes from multiple sources with different schemas that need integration
- Relationship accuracy is more important than content recall
Do not use knowledge graphs when:
- Your queries are about document content rather than relationships
- The relationship structure is shallow and could be flattened into attributes
- Traversal latency is unacceptable for your use case
- Maintaining graph consistency is more work than the query capability is worth
- You cannot ensure graph completeness for the queries you need to answer
Accept that:
- Graph completeness is always limited by extraction quality
- Traversal depth affects both latency and accuracy
- Coverage analysis is needed to understand graph limitations
- Graphs and retrieval are complementary, not competing approaches
The treasure map works because it describes a path through known terrain. KG traversal works when your queries are about paths through explicit relationships. When your queries are about content, use retrieval. When your queries are about connections, use graphs.