Knowledge Graphs for Enterprise AI

Simor Consulting | 14 Jun, 2024 | 09 Mins read

Knowledge Graphs for Enterprise AI

Enterprise AI systems often lack contextual understanding of organizational knowledge and operate in isolated silos. Knowledge graphs address these limitations by providing a semantic layer that connects information across the enterprise.

What are Knowledge Graphs?

Knowledge graphs are structured representations of facts, concepts, and their relationships. Unlike traditional databases that store information as tables, knowledge graphs store information as a network of interlinked entities and relationships.

This diagram requires JavaScript.

Enable JavaScript in your browser to use this feature.

At their core, knowledge graphs consist of:

Entities: Objects or concepts (products, people, documents, etc.)
Relationships: Connections between entities (works-for, contains, depends-on)
Attributes: Properties that describe entities (name, date, status)
Ontology: The schema or model that defines types of entities and relationships

Why Knowledge Graphs Matter for Enterprise AI

Knowledge graphs solve several key challenges in enterprise AI:

Context and relevance: They provide essential context for AI applications to make more informed recommendations and decisions
Unified knowledge: They break down silos by connecting information across departmental boundaries
Explainability: They improve the explainability of AI by making relationships explicit
Domain knowledge incorporation: They capture and formalize human expertise

Building an Enterprise Knowledge Graph

Creating an effective enterprise knowledge graph involves several key stages:

1. Define Your Ontology

The ontology is the conceptual framework for your knowledge graph. Start by identifying:

What key entity types will your graph contain?
What relationships exist between them?
What attributes will each entity have?

# Example ontology definition using RDFLib in Python
from rdflib import Graph, Namespace, Literal, URIRef
from rdflib.namespace import RDF, RDFS, XSD

# Define namespaces
ENTERPRISE = Namespace("https://enterprise.com/ontology#")
PRODUCT = Namespace("https://enterprise.com/product#")
CUSTOMER = Namespace("https://enterprise.com/customer#")

# Create a graph
g = Graph()

# Define classes (entity types)
g.add((ENTERPRISE.Product, RDF.type, RDFS.Class))
g.add((ENTERPRISE.Customer, RDF.type, RDFS.Class))
g.add((ENTERPRISE.Employee, RDF.type, RDFS.Class))
g.add((ENTERPRISE.Department, RDF.type, RDFS.Class))

# Define relationships
g.add((ENTERPRISE.hasCustomer, RDF.type, RDF.Property))
g.add((ENTERPRISE.hasCustomer, RDFS.domain, ENTERPRISE.Product))
g.add((ENTERPRISE.hasCustomer, RDFS.range, ENTERPRISE.Customer))

g.add((ENTERPRISE.worksIn, RDF.type, RDF.Property))
g.add((ENTERPRISE.worksIn, RDFS.domain, ENTERPRISE.Employee))
g.add((ENTERPRISE.worksIn, RDFS.range, ENTERPRISE.Department))

# Define attributes
g.add((ENTERPRISE.name, RDF.type, RDF.Property))
g.add((ENTERPRISE.name, RDFS.domain, RDFS.Resource))
g.add((ENTERPRISE.name, RDFS.range, XSD.string))

g.add((ENTERPRISE.startDate, RDF.type, RDF.Property))
g.add((ENTERPRISE.startDate, RDFS.domain, ENTERPRISE.Employee))
g.add((ENTERPRISE.startDate, RDFS.range, XSD.date))

# Export the ontology
g.serialize(destination="enterprise_ontology.ttl", format="turtle")

2. Data Integration and Ingestion

To populate your knowledge graph, you’ll need to integrate data from multiple sources:

Structured data: Databases, CRM systems, ERP systems
Semi-structured data: JSON APIs, XML files
Unstructured data: Documents, emails, wikis

Here’s a Python example of how you might extract entities from various data sources:

import pandas as pd
import spacy
from rdflib import Graph, URIRef, Literal
from rdflib.namespace import RDF, XSD

# Load NLP model for entity extraction from text
nlp = spacy.load("en_core_web_lg")

# Function to extract entities from structured data
def extract_from_database(conn, graph):
    # Example: Extract product data from database
    products = pd.read_sql("SELECT id, name, category, launch_date FROM products", conn)

    for _, row in products.iterrows():
        product_uri = URIRef(f"{PRODUCT}{row['id']}")
        graph.add((product_uri, RDF.type, ENTERPRISE.Product))
        graph.add((product_uri, ENTERPRISE.name, Literal(row['name'])))
        graph.add((product_uri, ENTERPRISE.category, Literal(row['category'])))
        graph.add((product_uri, ENTERPRISE.launchDate,
                  Literal(row['launch_date'], datatype=XSD.date)))

# Function to extract entities from unstructured text
def extract_from_document(doc_text, graph):
    doc = nlp(doc_text)

    # Extract entities using NLP
    for entity in doc.ents:
        if entity.label_ == "PERSON":
            # Create or link to employee entities
            employee_uri = URIRef(f"{ENTERPRISE}employee/{entity.text.replace(' ', '_')}")
            graph.add((employee_uri, RDF.type, ENTERPRISE.Employee))
            graph.add((employee_uri, ENTERPRISE.name, Literal(entity.text)))

        elif entity.label_ == "ORG":
            # Create or link to organization entities
            org_uri = URIRef(f"{ENTERPRISE}organization/{entity.text.replace(' ', '_')}")
            graph.add((org_uri, RDF.type, ENTERPRISE.Organization))
            graph.add((org_uri, ENTERPRISE.name, Literal(entity.text)))

3. Knowledge Graph Storage and Management

Several technologies are available for storing and managing knowledge graphs:

Graph Databases:

Neo4j: Popular graph database with Cypher query language
Amazon Neptune: Fully managed graph database service
ArangoDB: Multi-model database supporting graphs

Triple Stores (RDF):

GraphDB: Enterprise-grade RDF and graph database
Stardog: Knowledge graph platform with SPARQL support
Apache Jena Fuseki: Open-source RDF database

Here’s an example of loading data into Neo4j:

from neo4j import GraphDatabase

class KnowledgeGraphLoader:
    def __init__(self, uri, user, password):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))

    def close(self):
        self.driver.close()

    def load_product(self, product_id, name, category):
        with self.driver.session() as session:
            session.write_transaction(self._create_product, product_id, name, category)

    @staticmethod
    def _create_product(tx, product_id, name, category):
        # Create product node
        query = (
            "MERGE (p:Product {id: $product_id}) "
            "SET p.name = $name, p.category = $category "
            "RETURN p"
        )
        result = tx.run(query, product_id=product_id, name=name, category=category)
        return result.single()

    def link_product_to_customer(self, product_id, customer_id, relationship_type):
        with self.driver.session() as session:
            session.write_transaction(
                self._create_relationship,
                product_id,
                customer_id,
                relationship_type
            )

    @staticmethod
    def _create_relationship(tx, product_id, customer_id, relationship_type):
        # Link product to customer
        query = (
            f"MATCH (p:Product {{id: $product_id}}), (c:Customer {{id: $customer_id}}) "
            f"MERGE (p)-[r:{relationship_type}]->(c) "
            f"RETURN p, r, c"
        )
        result = tx.run(query, product_id=product_id, customer_id=customer_id)
        return result.single()

4. Knowledge Graph Enrichment

Once your basic knowledge graph is established, you can enrich it with:

Inference and reasoning: Derive new facts from existing information
Entity resolution: Identify and merge duplicates
Knowledge graph embeddings: Create vector representations of entities and relationships

Here’s an example of entity resolution:

from py_stringmatching.similarity_measure.levenshtein import Levenshtein
from py_stringmatching.similarity_measure.jaccard import Jaccard

def resolve_entities(graph, entity_type, threshold=0.85):
    """Find and suggest merging of similar entities"""

    # Get all entities of the given type
    query = f"""
    MATCH (e:{entity_type})
    RETURN e.id AS id, e.name AS name
    """

    with graph.driver.session() as session:
        entities = session.run(query).data()

    # Create a similarity matrix
    lev = Levenshtein()
    jac = Jaccard()

    potential_matches = []

    # Compare each pair
    for i in range(len(entities)):
        for j in range(i+1, len(entities)):
            name1 = entities[i]['name']
            name2 = entities[j]['name']

            # Calculate string similarities
            lev_sim = lev.get_sim_score(name1, name2)
            jac_sim = jac.get_sim_score(name1.split(), name2.split())

            # Combined score
            combined_score = (lev_sim + jac_sim) / 2

            if combined_score > threshold:
                potential_matches.append({
                    'id1': entities[i]['id'],
                    'id2': entities[j]['id'],
                    'name1': name1,
                    'name2': name2,
                    'score': combined_score
                })

    return potential_matches

Using Knowledge Graphs to Enhance Enterprise AI

Now that we have established a knowledge graph, let’s look at how it can enhance AI applications across the enterprise:

1. Enhancing LLM-based Chatbots

Knowledge graphs can provide company-specific context to large language models:

import openai
from neo4j import GraphDatabase

class KnowledgeEnhancedChatbot:
    def __init__(self, neo4j_uri, neo4j_user, neo4j_password, openai_api_key):
        self.driver = GraphDatabase.driver(neo4j_uri, auth=(neo4j_user, neo4j_password))
        openai.api_key = openai_api_key

    def answer_query(self, user_query):
        # Retrieve relevant context from knowledge graph
        context = self._get_relevant_context(user_query)

        # Combine user query with context for the LLM
        prompt = f"""
        You are an enterprise assistant with access to company-specific information.
        Use the following company information to answer the user's question:

        {context}

        User question: {user_query}
        """

        # Get response from OpenAI
        response = openai.ChatCompletion.create(
            model="gpt-4",
            messages=[{"role": "system", "content": prompt}]
        )

        return response.choices[0].message.content

    def _get_relevant_context(self, query):
        # Extract entities from the query
        entities = self._extract_entities(query)

        # Create Cypher query to fetch relevant subgraph
        cypher_query = """
        MATCH path = (n)-[*1..2]-(m)
        WHERE n.name IN $entity_names
        RETURN path
        LIMIT 50
        """

        with self.driver.session() as session:
            result = session.run(cypher_query, entity_names=entities)
            # Format the results as text
            context = self._format_graph_results(result)

        return context

    def _extract_entities(self, text):
        # Simple implementation - in production, use NER
        # or entity linking to extract relevant entities
        keywords = text.lower().split()

        # Query knowledge graph for entities matching keywords
        query = """
        MATCH (n)
        WHERE any(keyword IN $keywords WHERE toLower(n.name) CONTAINS keyword)
        RETURN DISTINCT n.name AS entity_name
        LIMIT 5
        """

        with self.driver.session() as session:
            result = session.run(query, keywords=keywords)
            entities = [record["entity_name"] for record in result]

        return entities

    def _format_graph_results(self, results):
        # Process Neo4j results and format as text
        formatted_text = "Company Knowledge:\n"

        for record in results:
            path = record["path"]
            # Extract nodes and relationships from path
            # Format them as readable text

            # Simplified example:
            nodes = [n for n in path.nodes]
            rels = [r for r in path.relationships]

            for i, rel in enumerate(rels):
                start_node = nodes[i]
                end_node = nodes[i+1]
                formatted_text += f"- {start_node['name']} {rel.type} {end_node['name']}\n"

        return formatted_text

2. Intelligent Document Search and Recommendation

Knowledge graphs can transform document search from keyword-matching to semantic understanding:

class KnowledgeGraphDocumentSearch:
    def __init__(self, neo4j_uri, neo4j_user, neo4j_password):
        self.driver = GraphDatabase.driver(neo4j_uri, auth=(neo4j_user, neo4j_password))

    def semantic_document_search(self, user_query, user_id=None):
        # Extract concepts from the query
        concepts = self._extract_concepts(user_query)

        # Construct query based on concepts and user context
        cypher_query = """
        // Find documents related to the concepts in the query
        MATCH (d:Document)-[:ABOUT]->(c:Concept)
        WHERE c.name IN $concepts

        // If we have user context, factor in their department and role
        OPTIONAL MATCH (u:User {id: $user_id})-[:WORKS_IN]->(dept:Department)
        OPTIONAL MATCH (u)-[:HAS_ROLE]->(role:Role)

        // Calculate relevance score based on concept matches and user context
        WITH d,
             count(DISTINCT c) AS conceptMatches,
             CASE WHEN u IS NOT NULL
                  THEN (CASE WHEN d.department = dept.name THEN 2 ELSE 0 END) +
                       (CASE WHEN d.audience = role.name THEN 1 ELSE 0 END)
                  ELSE 0
             END AS userContextScore

        // Calculate final score and return top results
        WITH d, conceptMatches * 3 + userContextScore AS relevanceScore
        ORDER BY relevanceScore DESC
        LIMIT 10

        RETURN d.title, d.url, d.summary, relevanceScore
        """

        with self.driver.session() as session:
            result = session.run(cypher_query, concepts=concepts, user_id=user_id)
            documents = [{"title": record["d.title"],
                          "url": record["d.url"],
                          "summary": record["d.summary"],
                          "score": record["relevanceScore"]}
                         for record in result]

        return documents

    def related_document_recommendations(self, document_id):
        # Find documents related to the current document through shared concepts
        cypher_query = """
        // Start with the current document
        MATCH (current:Document {id: $document_id})

        // Find concepts related to this document
        MATCH (current)-[:ABOUT]->(c:Concept)

        // Find other documents about the same concepts
        MATCH (other:Document)-[:ABOUT]->(c)
        WHERE other <> current

        // Count shared concepts and calculate similarity
        WITH other, count(DISTINCT c) AS sharedConcepts
        ORDER BY sharedConcepts DESC
        LIMIT 5

        RETURN other.title, other.url, other.summary, sharedConcepts
        """

        with self.driver.session() as session:
            result = session.run(cypher_query, document_id=document_id)
            recommendations = [{"title": record["other.title"],
                                "url": record["other.url"],
                                "summary": record["other.summary"],
                                "shared_concepts": record["sharedConcepts"]}
                               for record in result]

        return recommendations

    def _extract_concepts(self, query):
        # In a real implementation, use NLP techniques to extract concepts
        # This is a simplified placeholder
        return query.lower().split()

3. Enhanced Customer 360 View

Knowledge graphs excel at connecting customer data across silos:

class CustomerKnowledgeGraph:
    def __init__(self, neo4j_uri, neo4j_user, neo4j_password):
        self.driver = GraphDatabase.driver(neo4j_uri, auth=(neo4j_user, neo4j_password))

    def get_customer_360(self, customer_id):
        # Comprehensive query that brings together all customer information
        cypher_query = """
        // Start with the customer
        MATCH (c:Customer {id: $customer_id})

        // Get basic customer information
        WITH c

        // Get products the customer has purchased
        OPTIONAL MATCH (c)-[purchase:PURCHASED]->(p:Product)
        WITH c, collect({product: p, purchaseDate: purchase.date,
                         amount: purchase.amount}) AS purchases

        // Get support tickets
        OPTIONAL MATCH (c)-[:SUBMITTED]->(t:Ticket)
        WITH c, purchases, collect(t) AS tickets

        // Get marketing interactions
        OPTIONAL MATCH (c)-[int:INTERACTED_WITH]->(camp:Campaign)
        WITH c, purchases, tickets,
             collect({campaign: camp, date: int.date, channel: int.channel}) AS interactions

        // Get customer satisfaction surveys
        OPTIONAL MATCH (c)-[:RESPONDED_TO]->(s:Survey)
        WITH c, purchases, tickets, interactions, collect(s) AS surveys

        // Get people who have interacted with this customer
        OPTIONAL MATCH (e:Employee)-[rel:CONTACTED]->(c)
        WITH c, purchases, tickets, interactions, surveys,
             collect({employee: e, role: e.role, date: rel.date}) AS contacts

        // Return the complete customer view
        RETURN c, purchases, tickets, interactions, surveys, contacts
        """

        with self.driver.session() as session:
            result = session.run(cypher_query, customer_id=customer_id)
            record = result.single()

            if not record:
                return None

            # Process the result into a structured format
            customer = self._process_customer_record(record)

        return customer

    def get_product_recommendations(self, customer_id):
        # Knowledge graph-based product recommendations
        cypher_query = """
        // Find products purchased by similar customers
        MATCH (c:Customer {id: $customer_id})-[:PURCHASED]->(p:Product)
        MATCH (other:Customer)-[:PURCHASED]->(p)
        MATCH (other)-[:PURCHASED]->(rec:Product)
        WHERE NOT (c)-[:PURCHASED]->(rec)

        // Calculate recommendation score based on customer similarity
        WITH rec, count(DISTINCT other) AS customerOverlap

        // Also consider product category preferences
        OPTIONAL MATCH (c:Customer {id: $customer_id})-[:PURCHASED]->
                       (:Product)-[:HAS_CATEGORY]->(cat:Category)
        OPTIONAL MATCH (rec)-[:HAS_CATEGORY]->(cat)
        WITH rec, customerOverlap, count(DISTINCT cat) AS categoryMatch

        // Calculate final score and return top recommendations
        WITH rec, customerOverlap * 2 + categoryMatch * 3 AS recommendationScore
        ORDER BY recommendationScore DESC
        LIMIT 5

        RETURN rec.id, rec.name, rec.price, rec.category, recommendationScore
        """

        with self.driver.session() as session:
            result = session.run(cypher_query, customer_id=customer_id)
            recommendations = [{"id": record["rec.id"],
                                "name": record["rec.name"],
                                "price": record["rec.price"],
                                "category": record["rec.category"],
                                "score": record["recommendationScore"]}
                               for record in result]

        return recommendations

    def _process_customer_record(self, record):
        # Process Neo4j record into a structured customer profile
        # In a real implementation, this would extract all the nested data

        customer_node = record["c"]

        customer = {
            "id": customer_node["id"],
            "name": customer_node["name"],
            "email": customer_node["email"],
            "company": customer_node.get("company"),
            "industry": customer_node.get("industry"),
            "created_date": customer_node.get("created_date"),
            "purchases": [self._format_purchase(p) for p in record["purchases"]],
            "tickets": [self._format_ticket(t) for t in record["tickets"]],
            "marketing_interactions": [self._format_interaction(i) for i in record["interactions"]],
            "surveys": [self._format_survey(s) for s in record["surveys"]],
            "contacts": [self._format_contact(c) for c in record["contacts"]]
        }

        return customer

    # Helper methods for formatting different relationship types
    def _format_purchase(self, purchase_data):
        product = purchase_data["product"]
        return {
            "product_id": product["id"],
            "product_name": product["name"],
            "purchase_date": purchase_data["purchaseDate"],
            "amount": purchase_data["amount"]
        }

    # Additional formatting methods would follow for other relationship types

4. Reasoning and Inference for Decision Support

Knowledge graphs can support complex reasoning and inference for decision making:

class KnowledgeGraphReasoner:
    def __init__(self, neo4j_uri, neo4j_user, neo4j_password):
        self.driver = GraphDatabase.driver(neo4j_uri, auth=(neo4j_user, neo4j_password))

    def product_risk_assessment(self, product_id):
        # Use knowledge graph to assess risks related to a product
        cypher_query = """
        // Start with the product
        MATCH (p:Product {id: $product_id})

        // Get components and their suppliers
        MATCH (p)-[:CONTAINS]->(c:Component)-[:SUPPLIED_BY]->(s:Supplier)

        // Assess supplier risks
        WITH p, c, s,
             CASE WHEN s.reliability < 0.7 THEN 'High supplier reliability risk'
                  WHEN s.reliability < 0.9 THEN 'Medium supplier reliability risk'
                  ELSE 'Low supplier reliability risk'
             END AS supplierRisk,
             CASE WHEN s.countries IS NOT NULL AND
                       any(country IN s.countries WHERE country IN ['Country1', 'Country2'])
                  THEN 'Geopolitical risk identified'
                  ELSE NULL
             END AS geopoliticalRisk

        // Collect all risks by component
        WITH p, collect({
            component: c.name,
            supplier: s.name,
            supplierRisk: supplierRisk,
            geopoliticalRisk: geopoliticalRisk
        }) AS componentRisks

        // Check for customer dependencies
        OPTIONAL MATCH (customer:Customer)-[:MISSION_CRITICAL]->(p)
        WITH p, componentRisks, collect(customer) AS criticalCustomers

        // Check recent issues and incidents
        OPTIONAL MATCH (p)<-[:AFFECTS]-(i:Incident)
        WHERE i.date > datetime() - duration('P90D')  // Last 90 days
        WITH p, componentRisks, criticalCustomers, collect(i) AS recentIncidents

        // Return comprehensive risk assessment
        RETURN p.name AS productName,
               componentRisks,
               size(criticalCustomers) AS criticalCustomerCount,
               criticalCustomers,
               recentIncidents,
               CASE WHEN size(recentIncidents) > 3 THEN 'High incident rate risk'
                    WHEN size(recentIncidents) > 0 THEN 'Medium incident rate risk'
                    ELSE 'Low incident rate risk'
               END AS incidentRisk
        """

        with self.driver.session() as session:
            result = session.run(cypher_query, product_id=product_id)
            record = result.single()

            if not record:
                return None

            risk_assessment = {
                "product_name": record["productName"],
                "component_risks": record["componentRisks"],
                "critical_customer_count": record["criticalCustomerCount"],
                "critical_customers": [c["name"] for c in record["criticalCustomers"]],
                "recent_incidents": [{"id": i["id"], "description": i["description"],
                                     "date": i["date"], "severity": i["severity"]}
                                    for i in record["recentIncidents"]],
                "incident_risk_level": record["incidentRisk"]
            }

            # Determine overall risk level
            risk_scores = {
                "High supplier reliability risk": 3,
                "Medium supplier reliability risk": 2,
                "Low supplier reliability risk": 1,
                "Geopolitical risk identified": 3,
                "High incident rate risk": 3,
                "Medium incident rate risk": 2,
                "Low incident rate risk": 1
            }

            risk_factors = [risk for component in record["componentRisks"]
                            for risk in [component["supplierRisk"], component["geopoliticalRisk"]]
                            if risk is not None]
            risk_factors.append(record["incidentRisk"])

            max_risk_score = max(risk_scores[factor] for factor in risk_factors)
            critical_factor = record["criticalCustomerCount"] > 5

            if max_risk_score == 3 or critical_factor:
                risk_assessment["overall_risk"] = "High"
            elif max_risk_score == 2:
                risk_assessment["overall_risk"] = "Medium"
            else:
                risk_assessment["overall_risk"] = "Low"

        return risk_assessment

Best Practices for Enterprise Knowledge Graphs

To maximize the value of knowledge graphs for your enterprise AI applications:

1. Start Small, Think Big

Begin with a focused use case in a specific domain
Design your ontology to be extensible as you grow
Establish processes for continuous knowledge acquisition

2. Focus on Data Quality

Implement robust entity resolution to avoid duplicates
Establish validation rules for new knowledge
Create feedback loops for knowledge correction

3. Integrate with Existing Systems

Connect knowledge graphs to your data warehouses and lakes
Establish APIs for applications to consume graph data
Use change data capture for continuous updates

4. Involve Domain Experts

Collaborate with subject matter experts to validate the ontology
Create tools for expert knowledge contribution
Establish governance for knowledge management

5. Measure Value and Impact

Define KPIs for knowledge graph adoption and usage
Track improvements in AI application performance
Measure business outcomes from enhanced decision making

Decision Rules

Use this checklist for knowledge graph decisions:

If your AI application needs company-specific context, add a knowledge graph layer
If you have multiple data silos that need connecting, a knowledge graph often beats joins
If explainability matters for AI decisions, use knowledge graphs to trace reasoning paths
If domain expertise needs to be captured and applied, formalize it in an ontology first
If query latency is critical, consider graph database indexing and query optimization

Knowledge graphs add operational complexity. Start with a focused use case before scaling.

Shipping a production AI system?

Find the control gaps before they turn into incidents. Take the AI Production Scorecard for a fast baseline across the seven layers, or book an architecture review and we will turn it into a hardening plan.

Take the AI Production Scorecard Book an Architecture Review

This comment section requires JavaScript.

Enable JavaScript in your browser to use this feature.