How we killed our ETL pipeline (and productivity went up)

How we killed our ETL pipeline (and productivity went up)

Simor Consulting | 26 May, 2026 | 05 Mins read

A B2B SaaS company running a customer success platform had a data pipeline that consumed sixty percent of the data engineering team’s time. Not feature work. Not analytics. Pipeline maintenance. The pipeline ingested data from the company’s own product database, transformed it through a chain of forty-seven SQL models, and wrote outputs to a warehouse that powered customer-facing dashboards. Every change to the product database schema risked breaking the pipeline. Every new metric required modifications to multiple models in the chain. The data team was a bottleneck for the entire company.

The pipeline ran nightly. When it broke — which happened an average of twice per week — the dashboards went stale. Customer success managers noticed within hours. Support tickets followed. The data team spent roughly twelve hours per week on pipeline triage and repair. This was not a scaling problem. The data volume was modest. This was an architectural problem: a monolithic pipeline that coupled every transformation to every other transformation through shared intermediate tables.

The coupling problem

The forty-seven SQL models formed a dependency chain. Model twelve consumed the output of models three and seven. Model twenty-nine consumed the output of models twelve, fifteen, and twenty-two. A schema change in the source database that affected model three propagated through the chain to every downstream model. The blast radius of any source change was unpredictable because the dependency graph was implicit — the SQL referenced table names, but there was no registry of which model produced which table.

When a product engineer added a column to the users table, the data team had to trace the impact through the model chain. Sometimes the column was not referenced by any downstream model, and no work was required. Sometimes it was referenced by model three, which fed model twelve, which fed model twenty-nine, which fed the customer-facing dashboard. The data team could not predict the blast radius without reading every model in the chain. This process took hours and was error-prone.

The root cause was a design that coupled transformations through shared state. Each model read from tables produced by other models and wrote tables that other models depended on. The models were not independent units of computation. They were a single monolithic program expressed as forty-seven separate SQL files.

The replacement: event-driven data contracts

We replaced the monolithic pipeline with an event-driven architecture built on data contracts. A data contract is an explicit agreement between a data producer and a data consumer about the schema, semantics, and SLA of a data product. The product team owned the contract for the source data. The data team owned the contracts for each transformed data product. When a contract changed, both parties negotiated the change before implementation.

This diagram requires JavaScript.

Enable JavaScript in your browser to use this feature.

Each transformation was an independent service that consumed events from a contract and produced events under a new contract. The contracts specified the schema, the update frequency, and the expected data quality rules. If a transformation consumed a contract that changed, the contract validation layer detected the drift at ingestion time and alerted the team before corrupted data reached downstream consumers.

The critical difference from the monolithic pipeline was the absence of shared intermediate tables. Each transformation wrote to its own output topic. Downstream transformations consumed from output topics, not from shared staging tables. A change to one transformation’s logic could not break another transformation’s input, because each transformation’s input was a published contract with a defined schema and quality guarantees.

The migration sequence

We migrated one data product at a time, starting from the leaves of the dependency graph and working inward. The health score transformation was first because it had the fewest upstream dependencies. We built it as an independent service, validated its output against the monolithic pipeline’s output for two weeks, then switched the dashboard to the new source.

Each subsequent migration followed the same protocol. The new service ran in parallel with the monolithic pipeline. The output was compared nightly. When the comparison showed consistent agreement for two weeks, the dashboard was switched to the new source. The monolithic pipeline’s corresponding SQL model was then retired.

The full migration took fourteen weeks. At the end, the forty-seven SQL models had been replaced by twelve independent transformation services, each with its own contract, its own deployment pipeline, and its own monitoring.

What we gave up

The event-driven architecture was more complex to reason about than the monolithic pipeline. With the monolith, a single SQL file described a complete transformation from source to output. With the event-driven architecture, a transformation’s logic was spread across an event consumer, a transformation function, and an event producer. Understanding the full data flow required reading contracts and tracing event streams rather than reading SQL.

The second trade-off was latency. The monolithic pipeline ran nightly. The event-driven architecture processed events continuously, which should have been faster. But the contract validation layer added latency to every event. Total end-to-end latency from source event to dashboard update was fifteen minutes — faster than nightly, but not the sub-second latency that an event-driven architecture might suggest.

The third trade-off was the contract negotiation process. When a product engineer wanted to change the schema of the users table, they now had to coordinate with the data team through the contract registry. This added a step to the product development workflow. Most engineers considered this a benefit — they knew exactly which downstream systems would be affected by their change — but some found it bureaucratic.

Results

Pipeline maintenance time dropped from twelve hours per week to under two hours. The twice-weekly pipeline failures disappeared because the independent services could not cascade failures to each other. When a transformation failed, it failed in isolation. Downstream services continued processing with the last known good data, and the failing service replayed from the event stream once the issue was resolved.

Time-to-insight for new metrics dropped from an average of five days to under four hours. A product manager requesting a new metric could have a data product published within the same day, because the transformation was a new independent service rather than a modification to a fragile model chain.

The data team’s allocation shifted from sixty percent pipeline maintenance to fifteen percent. The remaining forty-five percent of their capacity was redirected to feature development and advanced analytics.

The decision heuristic

When your ETL pipeline consumes more of your data team’s time than your data products, the pipeline has become the product, and it is time to replace it. The signal to watch is not pipeline failures. It is the ratio of time spent maintaining the pipeline to time spent building on top of it. When maintenance exceeds fifty percent, the architectural cost of the monolith has exceeded the cost of migrating to a decoupled system. Kill the pipeline before it kills the team.

Ready to Implement These AI Data Engineering Solutions?

Get a comprehensive AI Readiness Assessment to determine the best approach for your organization's data infrastructure and AI implementation needs.

Similar Articles

The Modern Data Stack for AI Readiness: Architecture and Implementation
The Modern Data Stack for AI Readiness: Architecture and Implementation
28 Jan, 2025 | 03 Mins read

Existing data infrastructure often cannot support ML workflows. The modern data stack offers a foundation, but it requires adaptation to become AI-ready. This article covers building a data architectu

How a retailer reduced inference latency 90% with feature store caching
How a retailer reduced inference latency 90% with feature store caching
21 Apr, 2026 | 04 Mins read

A mid-market e-commerce retailer with roughly $200M in annual revenue had invested eighteen months building a product recommendation engine. The models were accurate. Offline evaluation showed meaning

The data pipeline that cost $50K/month — and the audit that found why
The data pipeline that cost $50K/month — and the audit that found why
22 Apr, 2026 | 04 Mins read

A financial services firm running analytics on trade settlement data came to us with a specific complaint: their cloud data platform cost had tripled in eighteen months, and nobody could explain why.

dbt vs SQLMesh: which transformation tool wins in 2026?
dbt vs SQLMesh: which transformation tool wins in 2026?
23 Apr, 2026 | 06 Mins read

Every analytics team eventually faces the same choice: how do you transform raw data into something analysts can actually use? For years, dbt was the only serious answer. SQLMesh arrived with a differ

Migrating from batch to streaming: a 6-month journey
Migrating from batch to streaming: a 6-month journey
28 Apr, 2026 | 05 Mins read

A logistics company processing two million shipments per day ran their entire operational reporting stack on nightly batch ETL. Every morning at 6 AM, operations managers reviewed dashboards built on

Data Lakehouse Security Best Practices
Data Lakehouse Security Best Practices
22 Feb, 2024 | 02 Mins read

Data lakehouses combine lake flexibility with warehouse performance but introduce security challenges from their hybrid nature. Securing these environments requires layered approaches covering authent

When RAG failed: a knowledge retrieval project post-mortem
When RAG failed: a knowledge retrieval project post-mortem
29 Apr, 2026 | 05 Mins read

A legal technology company had invested six months building a retrieval-augmented generation system to help contract attorneys find relevant precedent clauses across a corpus of 180,000 executed agree

From 3-hour dashboards to 3-minute insights: a BI modernization story
From 3-hour dashboards to 3-minute insights: a BI modernization story
05 May, 2026 | 05 Mins read

A manufacturing company with facilities in twelve countries ran its operational reporting on a traditional BI stack: a data warehouse, an ETL pipeline, and a dashboard tool that had been deployed six

Orchestration face-off: Airflow vs Prefect vs Dagster
Orchestration face-off: Airflow vs Prefect vs Dagster
07 May, 2026 | 06 Mins read

The orchestration market has a clear incumbent and two serious challengers. Apache Airflow has been the default choice since 2015. Prefect and Dagster both emerged to address Airflow's pain points, bu

The vector database that couldn't scale — and what we did instead
The vector database that couldn't scale — and what we did instead
12 May, 2026 | 05 Mins read

A media company with a library of twelve million articles, transcripts, and research documents had built a semantic search system on a managed vector database. The system was designed to let journalis

Building an AI operating system for a 10,000-person company
Building an AI operating system for a 10,000-person company
19 May, 2026 | 05 Mins read

A diversified industrial company with 10,000 employees across manufacturing, logistics, and field services had accumulated forty-seven separate AI projects over three years. Each business unit had bui

Real-time streaming: Kafka vs Redpanda vs Pulsar
Real-time streaming: Kafka vs Redpanda vs Pulsar
21 May, 2026 | 05 Mins read

Kafka has dominated event streaming for a decade. It processes trillions of messages daily across thousands of companies. Its dominance created an ecosystem so large that "streaming" became synonymous

Semantic Layer Implementation: Challenges and Solutions
Semantic Layer Implementation: Challenges and Solutions
20 Mar, 2024 | 02 Mins read

A semantic layer provides business-friendly abstraction over technical data structures, enabling self-service analytics and consistent metric interpretation. Implementing one involves technical challe

Serverless Data Pipelines: Architecture Patterns
Serverless Data Pipelines: Architecture Patterns
05 Jun, 2024 | 08 Mins read

# Serverless Data Pipelines: Architecture Patterns Serverless computing eliminates server management and provides automatic scaling with pay-per-use billing. These benefits matter for data pipelines

Event-Driven Data Architecture
Event-Driven Data Architecture
15 Sep, 2024 | 02 Mins read

Event-driven architectures treat changes in state as events that trigger immediate actions and data flows. Rather than processing data in batches or through scheduled jobs, components react to changes

From Data Silos to Data Mesh: The Evolution of Enterprise Data Architecture
From Data Silos to Data Mesh: The Evolution of Enterprise Data Architecture
15 Feb, 2025 | 03 Mins read

Traditional centralized data architectures worked for BI but struggle with AI workloads. Centralized teams become bottlenecks as data volumes grow. Domain experts who understand the data are separated

Case Study: End-to-End RAG Platform for Customer Support
Case Study: End-to-End RAG Platform for Customer Support
05 Dec, 2025 | 05 Mins read

A SaaS company with 200 support agents and 10,000+ knowledge base articles had an 18-hour average response time and 23% first-contact resolution. Their largest enterprise client threatened to cancel a

Case Study: Building a Production AI Knowledge Layer for Financial Services
Case Study: Building a Production AI Knowledge Layer for Financial Services
01 Mar, 2026 | 10 Mins read

A regional bank's investment research team spent 60% of their time gathering information and 40% doing analysis. Analysts had to search through regulatory filings, internal research memos, market data

Feature Stores for AI: The Missing MLOps Component Reaching Maturity
Feature Stores for AI: The Missing MLOps Component Reaching Maturity
12 Mar, 2026 | 11 Mins read

A recommendation system team built their tenth model. Each model required feature engineering. Each feature engineering project started by copying code from the previous project, then modifying it for

The AI Data Pipeline: Special Considerations for Unstructured and Structured Data
The AI Data Pipeline: Special Considerations for Unstructured and Structured Data
11 May, 2026 | 13 Mins read

Data pipelines for AI are not the same as data pipelines for traditional software systems. The outputs are different. The failure modes are different. The tolerance for data quality issues is differen