Data Council 2026 wrapped in Austin last week, and the signal-to-noise ratio was higher than in recent years. The conference has historically been the venue where data infrastructure practitioners — not vendors, not analysts — discuss what actually works in production. This year, three themes dominated the talks, hallway conversations, and unconference sessions.
Theme One: The Metadata Platform Is Dead
The most provocative claim came from a joint talk by two platform leads from mid-stage startups: the metadata platform as a product category is finished. Not because metadata does not matter, but because metadata has been absorbed into the data platforms themselves.
Three years ago, the data catalog was a separate product. You bought Alation, Atlan, or DataHub to manage your metadata. The argument at Data Council 2026 is that this category is collapsing because Snowflake, Databricks, and BigQuery now provide built-in metadata management, data lineage, and data discovery. The standalone metadata platform only made sense when the data warehouse did not track its own metadata.
The counterargument, voiced in the unconference, is that cross-platform metadata still requires a dedicated tool. If your data lives in Snowflake and S3 and a PostgreSQL instance and a Kafka cluster, no single platform’s built-in metadata covers your full data landscape. The reality is probably a middle ground: built-in metadata for single-platform shops, cross-platform tools for heterogeneous environments, and the standalone catalog product as a shrinking market.
The practitioner takeaway: if you are evaluating a metadata platform, ask whether the problem is truly cross-platform or whether your primary data warehouse has added the capabilities you need since you last evaluated.
Theme Two: Data Contracts Are Finally Getting Teeth
Data contracts have been discussed at every data conference for three years. The consistent complaint has been that contracts are a good idea with no enforcement mechanism. At Data Council 2026, three teams presented production implementations where data contracts are enforced at the schema level, with automated breaking-change detection and producer-side CI gates.
The pattern that works: define the contract as a schema with explicit quality guarantees (freshness, completeness, uniqueness). Store the contract in version control alongside the producing service. Add a CI check that compares the contract against the actual output schema. When a producer changes its output in a way that violates the contract, the CI check fails and the change cannot be merged.
This is not a new idea. What is new is that the tooling has matured to the point where implementation is practical. Schema registries, data observability platforms, and pipeline frameworks now provide the hooks needed for contract enforcement. The teams that presented had each built their enforcement in under a month.
The practitioner takeaway: if you have been waiting for data contracts to mature, the tooling is ready. Start with your highest-value data products, define contracts as schemas with quality SLAs, and enforce them in CI.
Theme Three: The AI Data Pipeline Is Not Your ETL Pipeline
The third dominant theme was the distinction between traditional ETL pipelines and AI data pipelines. The argument, made most clearly by a principal engineer from a large fintech, is that teams are making a category error when they try to run AI workloads on their existing data infrastructure.
Traditional ETL pipelines move structured data from source to destination on a schedule. AI data pipelines manage unstructured data, embedding generation, vector index updates, model training data curation, and evaluation dataset maintenance. The throughput, latency, and quality requirements are different. The failure modes are different. The monitoring requirements are different.
Several speakers argued that the “AI data pipeline” should be treated as a separate infrastructure concern with its own tooling, monitoring, and ownership. Trying to force AI data flows through Airflow DAGs designed for batch SQL transformations creates fragility that does not appear until the system is under load.
The practitioner takeaway: if you are building RAG pipelines, fine-tuning workflows, or agent data flows, evaluate whether your existing pipeline infrastructure is the right tool. In many cases, purpose-built AI data pipeline tools — vector database change feeds, embedding pipelines, evaluation harnesses — are more appropriate than extending your ETL platform.
The Unconference Signal
The unconference sessions surfaced two concerns that did not make it into formal talks but dominated informal discussion.
First, the cost of running AI workloads is creating budget pressure that is forcing data teams to make difficult prioritization decisions. Several teams reported that their AI infrastructure costs exceeded their traditional data warehouse costs for the first time in Q1 2026, and that leadership was questioning the ROI of AI investments that had been approved with optimistic projections.
Second, the talent market for data engineers with AI infrastructure skills is extremely tight. Teams are struggling to hire engineers who understand both traditional data systems and the new AI stack. The gap is not in ML engineering (building models) but in data engineering for AI (building the infrastructure that models run on).
Bounded Recommendation
The most actionable signal from Data Council 2026 is that data contracts are ready for production adoption. If you have been considering contracts, stop considering and start implementing. The second actionable signal is that AI data pipelines need dedicated infrastructure and ownership. If your AI data flows are running on your ETL platform as a temporary measure, the temporary measure has expired.