Semantic Layers & Metrics Stores: dbt Semantic Layer, Cube, Transform

Semantic Layers & Metrics Stores: dbt Semantic Layer, Cube, Transform

Simor Consulting | 07 Nov, 2025 | 05 Mins read

Every team has their own definition of “revenue.” The CFO calculates it one way, marketing another, and product a third. Each calculation is technically correct—they just use different definitions, time windows, and data sources. This isn’t just frustrating; it paralyzes decision-making. When the CEO asks for customer acquisition cost, three executives present three different numbers, and the board meeting devolves into debate about whose spreadsheet is right.

The root cause is architectural: metrics are defined in each tool, not centralized. Every dashboard, every BI tool, every spreadsheet computes metrics independently. Changes require updates in dozens of places.

Semantic layers solve this by centralizing metric definitions and exposing them through a single interface.

The Metrics Chaos Problem

As companies democratize data access, they create metric chaos:

  • Metric Proliferation: Hundreds of variations of “revenue” across tools
  • Definition Drift: Metrics evolve differently in each system
  • Conflicting Truth: Same question, different answers
  • Maintenance Nightmare: Changes require dozens of updates
  • Trust Erosion: Executives stop believing the data

Root causes:

Decentralized Definitions: Marketing counts trial users as customers, finance counts only paying users, and product counts anyone who logged in. All are “customers” in their contexts.

Tool Proliferation: Tableau, Looker, PowerBI, Excel, custom dashboards—all calculating independently.

Temporal Confusion: When does a customer become a customer? At signup? First purchase? After trial?

Context Loss: Raw SQL strips away business context.

This diagram requires JavaScript.

Enable JavaScript in your browser to use this feature.

Organizations report:

  • 40 hours/week reconciling metrics across teams
  • Millions annually in delayed decisions
  • 30% of analyst time recreating existing metrics

Semantic Layer Architecture

A semantic layer sits between raw data and business users, translating technical schemas into business concepts:

Single Source of Truth: One definition for each metric, accessible everywhere.

Business-Friendly Abstractions: Users work with entities like “customers,” “orders,” “revenue”—not tables and joins.

Governance with Flexibility: Central teams define core metrics; teams can extend and explore.

Performance at Scale: Caching, aggregation, and query optimization for fast responses.

This diagram requires JavaScript.

Enable JavaScript in your browser to use this feature.

Implementing dbt Semantic Layer

Defining Semantic Models

# models/semantic/customers.yml
semantic_models:
  - name: customers
    description: "Customer entities with key attributes and metrics"

    model: ref('fct_customers')

    entities:
      - name: customer
        type: primary
        expr: customer_id

    dimensions:
      - name: created_date
        type: time
        expr: created_at
        type_params:
          time_granularity: day

      - name: acquisition_channel
        type: categorical
        expr: acquisition_channel

      - name: lifetime_value_tier
        type: categorical
        expr: |
          CASE
            WHEN lifetime_value > 1000 THEN 'High'
            WHEN lifetime_value > 100 THEN 'Medium'
            ELSE 'Low'
          END

    measures:
      - name: customer_count
        description: "Count of unique customers"
        agg: count_distinct
        expr: customer_id

      - name: total_lifetime_value
        description: "Sum of all customer lifetime values"
        agg: sum
        expr: lifetime_value

      - name: average_order_value
        description: "Average order value per customer"
        agg: average
        expr: lifetime_value / order_count

      - name: acquisition_cost
        description: "Average cost to acquire a customer"
        agg: sum
        expr: marketing_spend / customer_count

Creating Metrics

# models/semantic/metrics.yml
metrics:
  - name: customer_acquisition_cost
    label: "Customer Acquisition Cost (CAC)"
    description: "Total marketing spend divided by new customers acquired"
    model: ref('marketing_spend')

    calculation_method: derived
    expression: "{{metric('total_marketing_spend')}} / {{metric('new_customers')}}"

    timestamp: date_day
    time_grains: [day, week, month, quarter, year]

    dimensions:
      - acquisition_channel
      - campaign_type
      - geographic_region

    filters:
      - field: is_valid_attribution
        operator: "="
        value: "true"

    meta:
      owner: "marketing_analytics"
      tier: 1
      refresh_frequency: "hourly"

  - name: customer_lifetime_value
    label: "Customer Lifetime Value (CLV)"
    description: "Predicted total value of a customer over their lifetime"
    model: ref('customer_ltv_model')

    calculation_method: average
    expression: predicted_ltv

    timestamp: cohort_month
    time_grains: [month, quarter, year]

    dimensions:
      - acquisition_channel
      - customer_segment
      - first_product_category

  - name: cac_to_ltv_ratio
    label: "CAC to LTV Ratio"
    description: "Efficiency of customer acquisition spend"

    calculation_method: derived
    expression: |
      {{metric('customer_acquisition_cost')}} /
      {{metric('customer_lifetime_value')}}

    meta:
      target_range: [0.2, 0.33]
      alert_threshold: 0.5

Exposing Metrics via API

from dbt_semantic_layer import SemanticLayerClient

class MetricsService:
    def __init__(self, config):
        self.client = SemanticLayerClient(
            environment_id=config.environment_id,
            auth_token=config.auth_token
        )

    def get_metric(self, metric_name, dimensions=None, filters=None, time_grain='day'):
        """Retrieve metric values with optional dimensions and filters"""

        query = self.client.query()
        query.select(metric=metric_name)

        if dimensions:
            for dim in dimensions:
                query.group_by(dimension=dim)

        if filters:
            for filter_config in filters:
                query.where(
                    dimension=filter_config['dimension'],
                    operator=filter_config['operator'],
                    value=filter_config['value']
                )

        query.time_dimension(
            dimension='metric_time',
            granularity=time_grain
        )

        results = query.run()
        return self.format_results(results)

Building with Cube

Data Modeling in Cube

// schema/Customers.js
cube(`Customers`, {
  sql: `SELECT * FROM analytics.fct_customers`,

  preAggregations: {
    customerMetrics: {
      measures: [
        Customers.count,
        Customers.totalLifetimeValue,
        Customers.averageOrderValue,
      ],
      dimensions: [Customers.acquisitionChannel, Customers.createdMonth],
      granularity: `day`,
      refreshKey: {
        every: `1 hour`,
      },
    },
  },

  joins: {
    Orders: {
      sql: `${CUBE}.customer_id = ${Orders}.customer_id`,
      relationship: `hasMany`,
    },

    MarketingSpend: {
      sql: `${CUBE}.acquisition_channel = ${MarketingSpend}.channel
             AND ${CUBE}.created_month = ${MarketingSpend}.month`,
      relationship: `hasOne`,
    },
  },

  measures: {
    count: {
      type: `count`,
      drillMembers: [id, createdAt, email],
    },

    totalLifetimeValue: {
      sql: `lifetime_value`,
      type: `sum`,
      format: `currency`,
    },

    averageOrderValue: {
      sql: `${totalLifetimeValue} / ${Orders.count}`,
      type: `number`,
      format: `currency`,
    },

    acquisitionCost: {
      sql: `${MarketingSpend.totalSpend} / ${count}`,
      type: `number`,
      format: `currency`,
      description: `Average cost to acquire a customer`,
    },
  },

  dimensions: {
    id: {
      sql: `customer_id`,
      type: `string`,
      primaryKey: true,
    },

    createdAt: {
      sql: `created_at`,
      type: `time`,
    },

    createdMonth: {
      sql: `DATE_TRUNC('month', created_at)`,
      type: `time`,
    },

    acquisitionChannel: {
      sql: `acquisition_channel`,
      type: `string`,
      title: `Acquisition Channel`,
    },

    ltv_tier: {
      sql: `
        CASE
          WHEN lifetime_value > 1000 THEN 'High'
          WHEN lifetime_value > 100 THEN 'Medium'
          ELSE 'Low'
        END
      `,
      type: `string`,
      title: `LTV Tier`,
    },
  },
});

Advanced Caching Strategies

// schema/Orders.js
cube(`Orders`, {
  sql: `SELECT * FROM analytics.fct_orders`,

  preAggregations: {
    // Real-time aggregation for recent data
    recentOrders: {
      measures: [Orders.count, Orders.totalRevenue, Orders.averageOrderValue],
      dimensions: [Orders.status, Orders.orderDate],
      granularity: `hour`,
      refreshKey: {
        every: `5 minutes`,
      },
      buildRangeStart: {
        sql: `SELECT NOW() - INTERVAL '7 days'`,
      },
      buildRangeEnd: {
        sql: `SELECT NOW()`,
      },
    },

    // Historical aggregation for older data
    historicalOrders: {
      measures: [Orders.count, Orders.totalRevenue],
      dimensions: [Orders.orderMonth, Customers.acquisitionChannel],
      granularity: `month`,
      refreshKey: {
        every: `1 day`,
      },
      buildRangeStart: {
        sql: `SELECT DATE_TRUNC('month', MIN(order_date)) FROM orders`,
      },
      buildRangeEnd: {
        sql: `SELECT NOW() - INTERVAL '7 days'`,
      },
    },
  },
});

Implementing Transform

MetricFlow Models

# transform/models/customer_metrics.yml
models:
  - name: customer_facts
    description: "Core customer facts and attributes"

    entities:
      - name: customer
        type: primary
        expr: customer_id

    dimensions:
      - name: created_date
        type: time
        type_params:
          time_granularity: day

      - name: acquisition_channel
        type: categorical

    measures:
      - name: customers_created
        description: "New customers created"
        agg: count
        expr: customer_id

      - name: lifetime_value_sum
        description: "Total lifetime value"
        agg: sum
        expr: lifetime_value

metrics:
  - name: new_customers
    description: "Count of new customers acquired"
    type: count
    type_params:
      measure: customers_created

  - name: average_ltv
    description: "Average customer lifetime value"
    type: derived
    type_params:
      expr: lifetime_value_sum / customers_created
      metrics:
        - lifetime_value_sum
        - customers_created

  - name: cac
    description: "Customer acquisition cost"
    type: ratio
    type_params:
      numerator:
        name: marketing_spend
        filter: "campaign_type = 'acquisition'"
      denominator:
        name: new_customers

Metric Versioning and Governance

class MetricGovernance:
    def __init__(self, transform_client):
        self.client = transform_client
        self.metric_registry = {}

    def register_metric(self, metric_definition):
        """Register a new metric with governance checks"""

        # Validate metric definition
        validation_result = self.validate_metric(metric_definition)
        if not validation_result.is_valid:
            raise ValueError(f"Invalid metric: {validation_result.errors}")

        # Check for conflicts
        conflicts = self.check_conflicts(metric_definition)
        if conflicts:
            raise ValueError(f"Metric conflicts with: {conflicts}")

        # Version the metric
        versioned_metric = self.version_metric(metric_definition)

        # Register with Transform
        self.client.create_metric(versioned_metric)

        # Update registry
        self.metric_registry[metric_definition['name']] = versioned_metric

        # Notify stakeholders
        self.notify_metric_change(versioned_metric, 'created')

        return versioned_metric

    def deprecate_metric(self, metric_name, replacement=None):
        """Deprecate a metric with migration path"""

        metric = self.metric_registry.get(metric_name)
        if not metric:
            raise ValueError(f"Metric {metric_name} not found")

        # Set deprecation metadata
        metric['status'] = 'deprecated'
        metric['deprecated_at'] = datetime.now()
        metric['replacement'] = replacement

        # Update all dependent metrics
        dependents = self.find_dependent_metrics(metric_name)
        for dependent in dependents:
            self.update_dependent_metric(dependent, metric_name, replacement)

        # Schedule removal
        self.schedule_metric_removal(metric_name, days=90)

        return metric

Unified Architecture

This diagram requires JavaScript.

Enable JavaScript in your browser to use this feature.

Decision Rules

Adopt a semantic layer when:

  • Metric discrepancies cause business conflict
  • Multiple BI tools compute the same metrics differently
  • Metric definitions change frequently and updates are error-prone
  • Business users need self-service analytics without SQL knowledge
  • Governance requires audit trails for metric calculations

Stick with direct querying when:

  • Organization is small with few analysts
  • Metrics are stable and rarely change
  • Single source of truth is already established
  • Overhead of maintaining semantic layer isn’t justified

Key principles:

  • Start with high-value metrics causing the most conflict
  • Involve business stakeholders in metric definitions
  • Version everything—metrics evolve
  • Performance matters: caching and pre-aggregation enable adoption
  • Governance enables innovation, not just control

Ready to Implement These AI Data Engineering Solutions?

Get a comprehensive AI Readiness Assessment to determine the best approach for your organization's data infrastructure and AI implementation needs.

Similar Articles