When your AI vendor goes bankrupt — surviving platform lock-in

When your AI vendor goes bankrupt — surviving platform lock-in

Simor Consulting | 23 Jun, 2026 | 05 Mins read

A healthcare analytics company received notice on a Tuesday afternoon that their primary AI infrastructure vendor was filing for Chapter 7 bankruptcy. The platform hosted their patient risk stratification models, their clinical trial matching service, and their population health analytics pipeline. The vendor’s service would shut down in ninety days. The company had ninety days to migrate every workload or lose it.

The company’s engineering team had six people. The platform hosted fourteen production models, eight data pipelines, and three customer-facing APIs. The vendor had been selected three years earlier because it offered a managed ML platform that abstracted away infrastructure complexity. The team had never provisioned their own GPU instances, managed their own container orchestration, or configured their own model serving infrastructure. The abstraction that made the vendor attractive now made the migration harder, because the team had no muscle memory for the infrastructure layer.

The lock-in anatomy

The vendor’s platform had three layers of lock-in. The first was infrastructure lock-in. Models were trained and served using the vendor’s proprietary orchestration layer. The team submitted training jobs through the vendor’s SDK. Model artifacts were stored in the vendor’s registry. Inference endpoints were exposed through the vendor’s API. None of these interfaces were portable. The team could export model weights, but the serving configuration — batch sizes, timeout settings, autoscaling rules — was encoded in the vendor’s proprietary format.

The second was data pipeline lock-in. The ETL pipelines were built using the vendor’s visual pipeline designer. The pipeline definitions were stored as JSON documents that described the vendor’s specific operator vocabulary. The operators were wrappers around standard transformations — read from S3, filter rows, join tables — but the wrapping was proprietary. Exporting the pipeline definitions did not produce runnable code. It produced a proprietary configuration that only the vendor’s runtime could execute.

The third was integration lock-in. Customer-facing APIs were routed through the vendor’s API gateway, which handled authentication, rate limiting, and request logging. The company’s API keys, customer configurations, and usage tracking were managed in the vendor’s system. Migrating the APIs meant rebuilding the integration layer from scratch.

This diagram requires JavaScript.

Enable JavaScript in your browser to use this feature.

The migration plan

We designed a ninety-day migration with three parallel workstreams: model serving, data pipelines, and API gateway. Each workstream had a dedicated engineer from the company’s team and a consultant. The total migration team was twelve people.

The model serving workstream had the most time pressure. The fourteen production models needed to be re-deployed on a portable serving infrastructure. We selected a Kubernetes-based serving stack using open-source tooling: KServe for model serving, MLflow for model registry, and Prometheus for monitoring. The critical step was extracting model weights and converting the serving configurations from the vendor’s proprietary format to the open-source format. This took three weeks for the fourteen models, with the most complex model requiring four days of configuration mapping.

The data pipeline workstream had the highest volume of work. The eight pipelines, described in the vendor’s proprietary JSON format, had to be reconstructed as code. We chose to rebuild them as Python scripts orchestrated by Airflow, because Python was the most readable option for the company’s team and Airflow was operationally simple. The team reverse-engineered each pipeline’s logic from the JSON definitions, wrote equivalent Python code, and validated output equivalence against the vendor’s platform. This process took five weeks.

The API gateway workstream had the most customer impact. Three customer-facing APIs needed to maintain their existing endpoints, authentication mechanisms, and response formats during the migration. We deployed a proxy layer that routed requests to either the vendor’s gateway or the new gateway based on a feature flag. This allowed a gradual cutover with the ability to roll back instantly if the new gateway produced errors.

The timeline

Days 1 through 14: assessment and planning. Inventory every model, pipeline, and API. Map each to its dependencies. Identify the migration order based on business criticality and technical complexity.

Days 15 through 45: parallel migration. The three workstreams operated simultaneously. Weekly integration tests verified that models served through the new infrastructure produced outputs matching the vendor’s platform. Pipeline outputs were compared row-by-row against the vendor’s outputs.

Days 46 through 60: integration testing. The three workstreams were connected end-to-end. Customer-facing APIs were pointed at the new model serving infrastructure. Pipelines fed data to the new model registry. End-to-end latency and accuracy were measured against the vendor’s baseline.

Days 61 through 75: staged cutover. One customer-facing API was migrated per week, starting with the lowest-traffic endpoint. Each cutover was monitored for forty-eight hours before proceeding to the next.

Days 76 through 90: cleanup and hardening. The vendor’s platform was used as a read-only fallback while the new infrastructure was hardened. On day 89, the final API was cut over. On day 90, the vendor’s platform went offline.

What we gave up

The vendor’s platform had automated model retraining that the new infrastructure did not replicate during the migration. Model retraining was paused for the duration of the migration. The team accepted model staleness as a cost of the accelerated timeline. Retraining was restored six weeks after the migration completed.

The vendor’s platform had a visual pipeline designer that non-engineers could use. The new Airflow-based pipelines required Python skills that not everyone on the team had. Two business analysts who had maintained pipelines through the vendor’s visual tool could no longer do so. The team hired an additional data engineer to absorb this work.

The vendor’s platform provided managed infrastructure with automatic patching and scaling. The new Kubernetes-based infrastructure required the team to manage upgrades, security patches, and capacity planning. Ongoing operational overhead increased by roughly twenty hours per month.

The lasting change

After the migration, the company adopted a vendor portability policy. Every new vendor engagement required that all model artifacts, pipeline definitions, and integration configurations be exportable in a non-proprietary format. Vendors that stored customer assets in proprietary formats were disqualified regardless of feature advantages.

The policy cost the company access to some best-of-breed platforms that used proprietary abstractions. It also meant that the team maintained more infrastructure knowledge in-house rather than outsourcing it to a managed platform. The team accepted this trade-off. Having lived through a ninety-day emergency migration, they preferred operational overhead to existential vendor risk.

The decision heuristic

If your AI infrastructure is hosted on a proprietary platform, audit your portability today. Export a model. Export a pipeline definition. Export an API configuration. If any of these exports cannot run without the vendor’s runtime, you have lock-in. Quantify the lock-in in weeks: how many weeks would it take to rebuild these assets on open infrastructure? If the answer is more than twelve weeks, you are one vendor bankruptcy away from a crisis. Fix the portability gap before you need it, because the day you need it, you will not have time to do it carefully.

Shipping a production AI system?

Find the control gaps before they turn into incidents. Take the AI Production Scorecard for a fast baseline across the seven layers, or book an architecture review and we will turn it into a hardening plan.

Similar Articles

Building AI-Ready Data Pipelines: Key Architecture Considerations
Building AI-Ready Data Pipelines: Key Architecture Considerations
04 Mar, 2025 | 02 Mins read

Data pipelines built for business intelligence often fail when supporting AI workloads. The root cause is usually architectural: BI pipelines assume bounded, relatively static datasets, while AI syste

The Modern Data Stack for AI Readiness: Architecture and Implementation
The Modern Data Stack for AI Readiness: Architecture and Implementation
28 Jan, 2025 | 03 Mins read

Existing data infrastructure often cannot support ML workflows. The modern data stack offers a foundation, but it requires adaptation to become AI-ready. This article covers building a data architectu

How a retailer reduced inference latency 90% with feature store caching
How a retailer reduced inference latency 90% with feature store caching
21 Apr, 2026 | 04 Mins read

A mid-market e-commerce retailer with roughly $200M in annual revenue had invested eighteen months building a product recommendation engine. The models were accurate. Offline evaluation showed meaning

The data pipeline that cost $50K/month — and the audit that found why
The data pipeline that cost $50K/month — and the audit that found why
22 Apr, 2026 | 04 Mins read

A financial services firm running analytics on trade settlement data came to us with a specific complaint: their cloud data platform cost had tripled in eighteen months, and nobody could explain why.

The 7-step vector database selection checklist
The 7-step vector database selection checklist
26 Apr, 2026 | 06 Mins read

Most vector database selection failures come down to one mistake: picking the technology before mapping the workload. Teams benchmark embedding search speed on a curated dataset, pick the fastest opti

Migrating from batch to streaming: a 6-month journey
Migrating from batch to streaming: a 6-month journey
28 Apr, 2026 | 05 Mins read

A logistics company processing two million shipments per day ran their entire operational reporting stack on nightly batch ETL. Every morning at 6 AM, operations managers reviewed dashboards built on

When RAG failed: a knowledge retrieval project post-mortem
When RAG failed: a knowledge retrieval project post-mortem
29 Apr, 2026 | 05 Mins read

A legal technology company had invested six months building a retrieval-augmented generation system to help contract attorneys find relevant precedent clauses across a corpus of 180,000 executed agree

The open-source LLM landscape just shifted — again
The open-source LLM landscape just shifted — again
02 May, 2026 | 03 Mins read

Three releases in the last six weeks have redrawn the open-source LLM map. Meta shipped Llama 4 with a mixture-of-experts architecture that narrows the gap with proprietary frontier models. Mistral re

Build vs buy: a decision tree for AI infrastructure
Build vs buy: a decision tree for AI infrastructure
03 May, 2026 | 06 Mins read

Every AI infrastructure team eventually faces the same argument. One faction wants to build a custom solution because the commercial options do not handle their specific requirements. The other factio

From 3-hour dashboards to 3-minute insights: a BI modernization story
From 3-hour dashboards to 3-minute insights: a BI modernization story
05 May, 2026 | 05 Mins read

A manufacturing company with facilities in twelve countries ran its operational reporting on a traditional BI stack: a data warehouse, an ETL pipeline, and a dashboard tool that had been deployed six

Why every cloud provider launched an AI operating system this year
Why every cloud provider launched an AI operating system this year
09 May, 2026 | 03 Mins read

AWS announced Bedrock Studio. Google shipped Vertex AI Platform as a unified surface. Azure consolidated its AI offerings under a single "AI Foundry" brand. Databricks, Snowflake, and even Cloudflare

The vector database that couldn't scale — and what we did instead
The vector database that couldn't scale — and what we did instead
12 May, 2026 | 05 Mins read

A media company with a library of twelve million articles, transcripts, and research documents had built a semantic search system on a managed vector database. The system was designed to let journalis

LLM evaluation platforms compared: LangSmith, Braintrust, Patronus
LLM evaluation platforms compared: LangSmith, Braintrust, Patronus
14 May, 2026 | 06 Mins read

Building an LLM application is the easy part. Knowing whether it works — whether it still works after you change a prompt, swap a model, or add a tool — is the hard part. LLM evaluation platforms exis

The A2A protocol and what it means for enterprise AI
The A2A protocol and what it means for enterprise AI
16 May, 2026 | 03 Mins read

Google published the Agent-to-Agent (A2A) protocol specification in late 2025 and, as of this quarter, has secured endorsement from over fifty technology companies including Salesforce, SAP, ServiceNo

Building an AI operating system for a 10,000-person company
Building an AI operating system for a 10,000-person company
19 May, 2026 | 05 Mins read

A diversified industrial company with 10,000 employees across manufacturing, logistics, and field services had accumulated forty-seven separate AI projects over three years. Each business unit had bui

A cost optimization framework for LLM inference
A cost optimization framework for LLM inference
24 May, 2026 | 06 Mins read

LLM inference costs follow a pattern that catches teams off guard. The first prototype costs almost nothing -- a few hundred dollars a month during development. The pilot scales to a few thousand. Pro

How we killed our ETL pipeline (and productivity went up)
How we killed our ETL pipeline (and productivity went up)
26 May, 2026 | 05 Mins read

A B2B SaaS company running a customer success platform had a data pipeline that consumed sixty percent of the data engineering team's time. Not feature work. Not analytics. Pipeline maintenance. The p

The observability stack: Datadog vs Grafana vs Monte Carlo
The observability stack: Datadog vs Grafana vs Monte Carlo
28 May, 2026 | 05 Mins read

Observability is not one problem — it is three. Infrastructure observability watches your servers, containers, and network. Application observability watches your code, APIs, and user-facing behavior.

AI spending is up 300% — where is it actually going?
AI spending is up 300% — where is it actually going?
27 May, 2026 | 03 Mins read

Enterprise AI spending increased roughly 300% year-over-year according to multiple industry surveys released this quarter. The headline number gets attention, but the breakdown is where the actionable

A compliance-first AI rollout in financial services
A compliance-first AI rollout in financial services
03 Jun, 2026 | 05 Mins read

A regional bank with $12 billion in assets wanted to use machine learning to improve its commercial loan underwriting process. The existing process was manual, relying on credit analysts who spent fou

RAG frameworks head-to-head: LlamaIndex vs Haystack vs Semantic Kernel
RAG frameworks head-to-head: LlamaIndex vs Haystack vs Semantic Kernel
04 Jun, 2026 | 05 Mins read

Retrieval-augmented generation is simple in theory: retrieve relevant documents, stuff them into a prompt, get a grounded answer. In practice, the retrieval step is where most RAG applications fail. T

The $2M model that never made it to production
The $2M model that never made it to production
09 Jun, 2026 | 05 Mins read

A retail chain with 400 stores spent two years and $2.1 million building an inventory optimization model. The model was technically excellent. It reduced predicted stockouts by thirty-two percent and

Data mesh in practice: year 2 retrospective
Data mesh in practice: year 2 retrospective
16 Jun, 2026 | 05 Mins read

An insurance company with $400 million in premium volume adopted data mesh two years ago. The central data team had become a bottleneck. Every business unit — claims, underwriting, actuarial, and dist

Designing guardrails: a practical architecture guide
Designing guardrails: a practical architecture guide
21 Jun, 2026 | 06 Mins read

The guardrail problem in AI is a tension between two failure modes. Too few guardrails and the system produces harmful, inaccurate, or brand-damaging outputs. Too many guardrails and the system refuse

The Rise of GPU Databases for AI Workloads
The Rise of GPU Databases for AI Workloads
22 Jan, 2024 | 03 Mins read

Traditional relational database management systems were designed for an era of megabyte-scale datasets and batch reporting. AI workloads demand processing terabyte-scale datasets with complex analytic

Vector Databases: The Missing Piece in Your AI Infrastructure
Vector Databases: The Missing Piece in Your AI Infrastructure
12 Jan, 2024 | 02 Mins read

Vector databases index and query high-dimensional vector embeddings. Unlike traditional databases that excel at exact matches, vector databases enable similarity search: finding items conceptually clo

Case Study: End-to-End RAG Platform for Customer Support
Case Study: End-to-End RAG Platform for Customer Support
05 Dec, 2025 | 05 Mins read

A SaaS company with 200 support agents and 10,000+ knowledge base articles had an 18-hour average response time and 23% first-contact resolution. Their largest enterprise client threatened to cancel a

Designing the Enterprise Knowledge Layer: Beyond RAG
Designing the Enterprise Knowledge Layer: Beyond RAG
16 Jan, 2026 | 14 Mins read

Most teams implement retrieval-augmented generation and call it a knowledge layer. Give the model access to a vector database, stuff in some documents, and ship. This approach works for demos. It fall

AI Agent Orchestration Patterns: From Chaining to Multi-Agent Systems
AI Agent Orchestration Patterns: From Chaining to Multi-Agent Systems
27 Jan, 2026 | 13 Mins read

A software debugging agent receives a bug report. It needs to search code, understand the error, propose a fix, write tests, and summarize for the developer. None of these steps are independent. Each

AI Infrastructure for Legacy Systems: Modernizing 20-Year-Old ERPs with AI
AI Infrastructure for Legacy Systems: Modernizing 20-Year-Old ERPs with AI
18 Feb, 2026 | 13 Mins read

A manufacturing company runs their operations on an ERP system installed in 2004. The vendor still supports it. The team knows how to maintain it. The integrations are stable. It works. The problem i

Case Study: Building a Production AI Knowledge Layer for Financial Services
Case Study: Building a Production AI Knowledge Layer for Financial Services
01 Mar, 2026 | 10 Mins read

A regional bank's investment research team spent 60% of their time gathering information and 40% doing analysis. Analysts had to search through regulatory filings, internal research memos, market data

Feature Stores for AI: The Missing MLOps Component Reaching Maturity
Feature Stores for AI: The Missing MLOps Component Reaching Maturity
12 Mar, 2026 | 11 Mins read

A recommendation system team built their tenth model. Each model required feature engineering. Each feature engineering project started by copying code from the previous project, then modifying it for

Tool Calling and Function Calling: Connecting AI to Enterprise Systems
Tool Calling and Function Calling: Connecting AI to Enterprise Systems
28 Mar, 2026 | 14 Mins read

A language model that only generates text is not enough for most enterprise problems. The real value emerges when an AI system can look up your customer record, check inventory levels across warehouse

The AI Data Pipeline: Special Considerations for Unstructured and Structured Data
The AI Data Pipeline: Special Considerations for Unstructured and Structured Data
11 May, 2026 | 13 Mins read

Data pipelines for AI are not the same as data pipelines for traditional software systems. The outputs are different. The failure modes are different. The tolerance for data quality issues is differen

Case Study: Multi-Agent System for Supply Chain Optimization
Case Study: Multi-Agent System for Supply Chain Optimization
13 Jun, 2026 | 12 Mins read

A mid-size automotive parts manufacturer with operations spanning 15 countries and relationships with over 200 suppliers faced a supply chain coordination problem that was consuming too much of their

Semantic Caching for AI: Reducing Latency and Cost with Meaning-Based Retrieval
Semantic Caching for AI: Reducing Latency and Cost with Meaning-Based Retrieval
19 May, 2026 | 07 Mins read

Every repeated question your AI system answers is money spent and latency incurred that you did not need to. If a thousand users ask the same question in a week, running it through the language model

AI Observability: Monitoring Hallucinations, Latency, and Cost at Scale
AI Observability: Monitoring Hallucinations, Latency, and Cost at Scale
30 Apr, 2026 | 09 Mins read

Traditional software monitoring tracks CPU utilization, memory consumption, request rates, and error counts. These metrics tell you whether your service is running and whether it is handling load. The

Evaluating LLM Providers for Enterprise: A Framework Beyond Benchmark
Evaluating LLM Providers for Enterprise: A Framework Beyond Benchmark
08 Apr, 2026 | 10 Mins read

Benchmark scores tell you how a model performs on problems that someone else chose. Your enterprise systems present different problems: your proprietary terminology, your specific data distributions,