A B2B SaaS company running a customer success platform had a data pipeline that consumed sixty percent of the data engineering team’s time. Not feature work. Not analytics. Pipeline maintenance. The pipeline ingested data from the company’s own product database, transformed it through a chain of forty-seven SQL models, and wrote outputs to a warehouse that powered customer-facing dashboards. Every change to the product database schema risked breaking the pipeline. Every new metric required modifications to multiple models in the chain. The data team was a bottleneck for the entire company.
The pipeline ran nightly. When it broke — which happened an average of twice per week — the dashboards went stale. Customer success managers noticed within hours. Support tickets followed. The data team spent roughly twelve hours per week on pipeline triage and repair. This was not a scaling problem. The data volume was modest. This was an architectural problem: a monolithic pipeline that coupled every transformation to every other transformation through shared intermediate tables.
The coupling problem
The forty-seven SQL models formed a dependency chain. Model twelve consumed the output of models three and seven. Model twenty-nine consumed the output of models twelve, fifteen, and twenty-two. A schema change in the source database that affected model three propagated through the chain to every downstream model. The blast radius of any source change was unpredictable because the dependency graph was implicit — the SQL referenced table names, but there was no registry of which model produced which table.
When a product engineer added a column to the users table, the data team had to trace the impact through the model chain. Sometimes the column was not referenced by any downstream model, and no work was required. Sometimes it was referenced by model three, which fed model twelve, which fed model twenty-nine, which fed the customer-facing dashboard. The data team could not predict the blast radius without reading every model in the chain. This process took hours and was error-prone.
The root cause was a design that coupled transformations through shared state. Each model read from tables produced by other models and wrote tables that other models depended on. The models were not independent units of computation. They were a single monolithic program expressed as forty-seven separate SQL files.
The replacement: event-driven data contracts
We replaced the monolithic pipeline with an event-driven architecture built on data contracts. A data contract is an explicit agreement between a data producer and a data consumer about the schema, semantics, and SLA of a data product. The product team owned the contract for the source data. The data team owned the contracts for each transformed data product. When a contract changed, both parties negotiated the change before implementation.
This diagram requires JavaScript.
Enable JavaScript in your browser to use this feature.
Each transformation was an independent service that consumed events from a contract and produced events under a new contract. The contracts specified the schema, the update frequency, and the expected data quality rules. If a transformation consumed a contract that changed, the contract validation layer detected the drift at ingestion time and alerted the team before corrupted data reached downstream consumers.
The critical difference from the monolithic pipeline was the absence of shared intermediate tables. Each transformation wrote to its own output topic. Downstream transformations consumed from output topics, not from shared staging tables. A change to one transformation’s logic could not break another transformation’s input, because each transformation’s input was a published contract with a defined schema and quality guarantees.
The migration sequence
We migrated one data product at a time, starting from the leaves of the dependency graph and working inward. The health score transformation was first because it had the fewest upstream dependencies. We built it as an independent service, validated its output against the monolithic pipeline’s output for two weeks, then switched the dashboard to the new source.
Each subsequent migration followed the same protocol. The new service ran in parallel with the monolithic pipeline. The output was compared nightly. When the comparison showed consistent agreement for two weeks, the dashboard was switched to the new source. The monolithic pipeline’s corresponding SQL model was then retired.
The full migration took fourteen weeks. At the end, the forty-seven SQL models had been replaced by twelve independent transformation services, each with its own contract, its own deployment pipeline, and its own monitoring.
What we gave up
The event-driven architecture was more complex to reason about than the monolithic pipeline. With the monolith, a single SQL file described a complete transformation from source to output. With the event-driven architecture, a transformation’s logic was spread across an event consumer, a transformation function, and an event producer. Understanding the full data flow required reading contracts and tracing event streams rather than reading SQL.
The second trade-off was latency. The monolithic pipeline ran nightly. The event-driven architecture processed events continuously, which should have been faster. But the contract validation layer added latency to every event. Total end-to-end latency from source event to dashboard update was fifteen minutes — faster than nightly, but not the sub-second latency that an event-driven architecture might suggest.
The third trade-off was the contract negotiation process. When a product engineer wanted to change the schema of the users table, they now had to coordinate with the data team through the contract registry. This added a step to the product development workflow. Most engineers considered this a benefit — they knew exactly which downstream systems would be affected by their change — but some found it bureaucratic.
Results
Pipeline maintenance time dropped from twelve hours per week to under two hours. The twice-weekly pipeline failures disappeared because the independent services could not cascade failures to each other. When a transformation failed, it failed in isolation. Downstream services continued processing with the last known good data, and the failing service replayed from the event stream once the issue was resolved.
Time-to-insight for new metrics dropped from an average of five days to under four hours. A product manager requesting a new metric could have a data product published within the same day, because the transformation was a new independent service rather than a modification to a fragile model chain.
The data team’s allocation shifted from sixty percent pipeline maintenance to fifteen percent. The remaining forty-five percent of their capacity was redirected to feature development and advanced analytics.
The decision heuristic
When your ETL pipeline consumes more of your data team’s time than your data products, the pipeline has become the product, and it is time to replace it. The signal to watch is not pipeline failures. It is the ratio of time spent maintaining the pipeline to time spent building on top of it. When maintenance exceeds fifty percent, the architectural cost of the monolith has exceeded the cost of migrating to a decoupled system. Kill the pipeline before it kills the team.