How we killed our ETL pipeline (and productivity went up)

Simor Consulting | 26 May, 2026 | 05 Mins read

A B2B SaaS company running a customer success platform had a data pipeline that consumed sixty percent of the data engineering team’s time. Not feature work. Not analytics. Pipeline maintenance. The pipeline ingested data from the company’s own product database, transformed it through a chain of forty-seven SQL models, and wrote outputs to a warehouse that powered customer-facing dashboards. Every change to the product database schema risked breaking the pipeline. Every new metric required modifications to multiple models in the chain. The data team was a bottleneck for the entire company.

The pipeline ran nightly. When it broke — which happened an average of twice per week — the dashboards went stale. Customer success managers noticed within hours. Support tickets followed. The data team spent roughly twelve hours per week on pipeline triage and repair. This was not a scaling problem. The data volume was modest. This was an architectural problem: a monolithic pipeline that coupled every transformation to every other transformation through shared intermediate tables.

The coupling problem

The forty-seven SQL models formed a dependency chain. Model twelve consumed the output of models three and seven. Model twenty-nine consumed the output of models twelve, fifteen, and twenty-two. A schema change in the source database that affected model three propagated through the chain to every downstream model. The blast radius of any source change was unpredictable because the dependency graph was implicit — the SQL referenced table names, but there was no registry of which model produced which table.

When a product engineer added a column to the users table, the data team had to trace the impact through the model chain. Sometimes the column was not referenced by any downstream model, and no work was required. Sometimes it was referenced by model three, which fed model twelve, which fed model twenty-nine, which fed the customer-facing dashboard. The data team could not predict the blast radius without reading every model in the chain. This process took hours and was error-prone.

The root cause was a design that coupled transformations through shared state. Each model read from tables produced by other models and wrote tables that other models depended on. The models were not independent units of computation. They were a single monolithic program expressed as forty-seven separate SQL files.

The replacement: event-driven data contracts

We replaced the monolithic pipeline with an event-driven architecture built on data contracts. A data contract is an explicit agreement between a data producer and a data consumer about the schema, semantics, and SLA of a data product. The product team owned the contract for the source data. The data team owned the contracts for each transformed data product. When a contract changed, both parties negotiated the change before implementation.

This diagram requires JavaScript.

Enable JavaScript in your browser to use this feature.

Each transformation was an independent service that consumed events from a contract and produced events under a new contract. The contracts specified the schema, the update frequency, and the expected data quality rules. If a transformation consumed a contract that changed, the contract validation layer detected the drift at ingestion time and alerted the team before corrupted data reached downstream consumers.

The critical difference from the monolithic pipeline was the absence of shared intermediate tables. Each transformation wrote to its own output topic. Downstream transformations consumed from output topics, not from shared staging tables. A change to one transformation’s logic could not break another transformation’s input, because each transformation’s input was a published contract with a defined schema and quality guarantees.

The migration sequence

We migrated one data product at a time, starting from the leaves of the dependency graph and working inward. The health score transformation was first because it had the fewest upstream dependencies. We built it as an independent service, validated its output against the monolithic pipeline’s output for two weeks, then switched the dashboard to the new source.

Each subsequent migration followed the same protocol. The new service ran in parallel with the monolithic pipeline. The output was compared nightly. When the comparison showed consistent agreement for two weeks, the dashboard was switched to the new source. The monolithic pipeline’s corresponding SQL model was then retired.

The full migration took fourteen weeks. At the end, the forty-seven SQL models had been replaced by twelve independent transformation services, each with its own contract, its own deployment pipeline, and its own monitoring.

What we gave up

The event-driven architecture was more complex to reason about than the monolithic pipeline. With the monolith, a single SQL file described a complete transformation from source to output. With the event-driven architecture, a transformation’s logic was spread across an event consumer, a transformation function, and an event producer. Understanding the full data flow required reading contracts and tracing event streams rather than reading SQL.

The second trade-off was latency. The monolithic pipeline ran nightly. The event-driven architecture processed events continuously, which should have been faster. But the contract validation layer added latency to every event. Total end-to-end latency from source event to dashboard update was fifteen minutes — faster than nightly, but not the sub-second latency that an event-driven architecture might suggest.

The third trade-off was the contract negotiation process. When a product engineer wanted to change the schema of the users table, they now had to coordinate with the data team through the contract registry. This added a step to the product development workflow. Most engineers considered this a benefit — they knew exactly which downstream systems would be affected by their change — but some found it bureaucratic.

Results

Pipeline maintenance time dropped from twelve hours per week to under two hours. The twice-weekly pipeline failures disappeared because the independent services could not cascade failures to each other. When a transformation failed, it failed in isolation. Downstream services continued processing with the last known good data, and the failing service replayed from the event stream once the issue was resolved.

Time-to-insight for new metrics dropped from an average of five days to under four hours. A product manager requesting a new metric could have a data product published within the same day, because the transformation was a new independent service rather than a modification to a fragile model chain.

The data team’s allocation shifted from sixty percent pipeline maintenance to fifteen percent. The remaining forty-five percent of their capacity was redirected to feature development and advanced analytics.

The decision heuristic

When your ETL pipeline consumes more of your data team’s time than your data products, the pipeline has become the product, and it is time to replace it. The signal to watch is not pipeline failures. It is the ratio of time spent maintaining the pipeline to time spent building on top of it. When maintenance exceeds fifty percent, the architectural cost of the monolith has exceeded the cost of migrating to a decoupled system. Kill the pipeline before it kills the team.

Shipping a production AI system?

Find the control gaps before they turn into incidents. Take the AI Production Scorecard for a fast baseline across the seven layers, or book an architecture review and we will turn it into a hardening plan.

Take the AI Production Scorecard Book an Architecture Review

This comment section requires JavaScript.

Enable JavaScript in your browser to use this feature.

Similar Articles

Data Architecture AI Infrastructure

The Modern Data Stack for AI Readiness: Architecture and Implementation

28 Jan, 2025 | 03 Mins read

Existing data infrastructure often cannot support ML workflows. The modern data stack offers a foundation, but it requires adaptation to become AI-ready. This article covers building a data architectu

Case Study AI Infrastructure

How a retailer reduced inference latency 90% with feature store caching

21 Apr, 2026 | 04 Mins read

A mid-market e-commerce retailer with roughly $200M in annual revenue had invested eighteen months building a product recommendation engine. The models were accurate. Offline evaluation showed meaning

Case Study Data Architecture

The data pipeline that cost $50K/month — and the audit that found why

22 Apr, 2026 | 04 Mins read

A financial services firm running analytics on trade settlement data came to us with a specific complaint: their cloud data platform cost had tripled in eighteen months, and nobody could explain why.

Tooling Data Architecture

dbt vs SQLMesh: which transformation tool wins in 2026?

23 Apr, 2026 | 06 Mins read

Every analytics team eventually faces the same choice: how do you transform raw data into something analysts can actually use? For years, dbt was the only serious answer. SQLMesh arrived with a differ

Case Study Data Architecture

Migrating from batch to streaming: a 6-month journey

28 Apr, 2026 | 05 Mins read

A logistics company processing two million shipments per day ran their entire operational reporting stack on nightly batch ETL. Every morning at 6 AM, operations managers reviewed dashboards built on

Data Security Data Architecture

Data Lakehouse Security Best Practices

22 Feb, 2024 | 02 Mins read

Data lakehouses combine lake flexibility with warehouse performance but introduce security challenges from their hybrid nature. Securing these environments requires layered approaches covering authent

Case Study Knowledge Layer

When RAG failed: a knowledge retrieval project post-mortem

29 Apr, 2026 | 05 Mins read

A legal technology company had invested six months building a retrieval-augmented generation system to help contract attorneys find relevant precedent clauses across a corpus of 180,000 executed agree

Tooling Data Architecture

Orchestration face-off: Airflow vs Prefect vs Dagster

07 May, 2026 | 06 Mins read

The orchestration market has a clear incumbent and two serious challengers. Apache Airflow has been the default choice since 2015. Prefect and Dagster both emerged to address Airflow's pain points, bu

Case Study Data Architecture

From 3-hour dashboards to 3-minute insights: a BI modernization story

05 May, 2026 | 05 Mins read

A manufacturing company with facilities in twelve countries ran its operational reporting on a traditional BI stack: a data warehouse, an ETL pipeline, and a dashboard tool that had been deployed six

Case Study AI Infrastructure

The vector database that couldn't scale — and what we did instead

12 May, 2026 | 05 Mins read

A media company with a library of twelve million articles, transcripts, and research documents had built a semantic search system on a managed vector database. The system was designed to let journalis

Case Study AI Infrastructure

Building an AI operating system for a 10,000-person company

19 May, 2026 | 05 Mins read

A diversified industrial company with 10,000 employees across manufacturing, logistics, and field services had accumulated forty-seven separate AI projects over three years. Each business unit had bui

Tooling Data Architecture

Real-time streaming: Kafka vs Redpanda vs Pulsar

21 May, 2026 | 05 Mins read

Kafka has dominated event streaming for a decade. It processes trillions of messages daily across thousands of companies. Its dominance created an ecosystem so large that "streaming" became synonymous

Case Study AI Governance

A compliance-first AI rollout in financial services

03 Jun, 2026 | 05 Mins read

A regional bank with $12 billion in assets wanted to use machine learning to improve its commercial loan underwriting process. The existing process was manual, relying on credit analysts who spent fou

Data Architecture Business Intelligence

Semantic Layer Implementation: Challenges and Solutions

20 Mar, 2024 | 02 Mins read

A semantic layer provides business-friendly abstraction over technical data structures, enabling self-service analytics and consistent metric interpretation. Implementing one involves technical challe

Case Study MLOps

The $2M model that never made it to production

09 Jun, 2026 | 05 Mins read

A retail chain with 400 stores spent two years and $2.1 million building an inventory optimization model. The model was technically excellent. It reduced predicted stockouts by thirty-two percent and

Tooling Data Architecture

Data cataloging tools: Atlan, Alation, DataHub, Amundsen

11 Jun, 2026 | 05 Mins read

A data catalog solves a trust problem. When an analyst cannot find the right table, does not know what a column means, or cannot tell whether data is fresh, they either guess or ask someone. Both outc

Case Study Data Architecture

Data mesh in practice: year 2 retrospective

16 Jun, 2026 | 05 Mins read

An insurance company with $400 million in premium volume adopted data mesh two years ago. The central data team had become a bottleneck. Every business unit — claims, underwriting, actuarial, and dist

Case Study AI Infrastructure

When your AI vendor goes bankrupt — surviving platform lock-in

23 Jun, 2026 | 05 Mins read

A healthcare analytics company received notice on a Tuesday afternoon that their primary AI infrastructure vendor was filing for Chapter 7 bankruptcy. The platform hosted their patient risk stratifica

Case Study AI Infrastructure

Real-time fraud detection: from proof-of-concept to production in 90 days

30 Jun, 2026 | 05 Mins read

A payment processor handling twelve million transactions per day had a fraud detection system that was accurate but slow. The system reviewed transactions in batch, four times per day. A fraudulent tr

Case Study Knowledge Layer

Consolidating 47 data sources into one knowledge layer

01 Jul, 2026 | 05 Mins read

A global professional services firm with 8,000 consultants maintained institutional knowledge across forty-seven separate systems. Project proposals lived in a document management system. Client engag

Case Study AI Governance

The GDPR audit that reshaped our entire ML pipeline

07 Jul, 2026 | 05 Mins read

A European fintech with twelve million customers received a GDPR audit notice from their national data protection authority. The audit focused on the company's machine learning pipeline, which powered

Case Study AI Governance

How a healthcare org deployed LLMs without violating HIPAA

14 Jul, 2026 | 05 Mins read

A hospital system with twelve facilities and 14,000 clinical staff wanted to use large language models to assist with clinical documentation. Physicians spent an average of two hours per day on docume

Tooling Data Architecture

Data quality platforms: Great Expectations vs Soda vs Monte Carlo

15 Jul, 2026 | 06 Mins read

Data quality failures are expensive and silent. A broken pipeline does not crash — it produces wrong data that flows into dashboards, models, and decisions. The error is discovered weeks later when a

Serverless Data Architecture

Serverless Data Pipelines: Architecture Patterns

05 Jun, 2024 | 08 Mins read

# Serverless Data Pipelines: Architecture Patterns Serverless computing eliminates server management and provides automatic scaling with pay-per-use billing. These benefits matter for data pipelines

Data Architecture Event Processing

Event-Driven Data Architecture

15 Sep, 2024 | 02 Mins read

Event-driven architectures treat changes in state as events that trigger immediate actions and data flows. Rather than processing data in batches or through scheduled jobs, components react to changes

Data Architecture Enterprise AI

From Data Silos to Data Mesh: The Evolution of Enterprise Data Architecture

15 Feb, 2025 | 03 Mins read

Traditional centralized data architectures worked for BI but struggle with AI workloads. Centralized teams become bottlenecks as data volumes grow. Domain experts who understand the data are separated

Case Study RAG

Case Study: End-to-End RAG Platform for Customer Support

05 Dec, 2025 | 05 Mins read

A SaaS company with 200 support agents and 10,000+ knowledge base articles had an 18-hour average response time and 23% first-contact resolution. Their largest enterprise client threatened to cancel a

Knowledge Layer Case Study

Case Study: Building a Production AI Knowledge Layer for Financial Services

01 Mar, 2026 | 10 Mins read

A regional bank's investment research team spent 60% of their time gathering information and 40% doing analysis. Analysts had to search through regulatory filings, internal research memos, market data

AI Infrastructure Data Architecture

Feature Stores for AI: The Missing MLOps Component Reaching Maturity

12 Mar, 2026 | 11 Mins read

A recommendation system team built their tenth model. Each model required feature engineering. Each feature engineering project started by copying code from the previous project, then modifying it for

Agent Orchestration Case Study

Case Study: Multi-Agent System for Supply Chain Optimization

13 Jun, 2026 | 12 Mins read

A mid-size automotive parts manufacturer with operations spanning 15 countries and relationships with over 200 suppliers faced a supply chain coordination problem that was consuming too much of their

Data Architecture AI Infrastructure

The AI Data Pipeline: Special Considerations for Unstructured and Structured Data

11 May, 2026 | 13 Mins read

Data pipelines for AI are not the same as data pipelines for traditional software systems. The outputs are different. The failure modes are different. The tolerance for data quality issues is differen