Data Infrastructure for Production AI

Practical writing on AI data engineering, feature stores, and the infrastructure choices that determine whether AI systems work in production.

AI Infrastructure Model Gateway

OpenAI vs Anthropic vs Google: Model Provider Failover Strategies

29 Jun, 2026 | 10 Mins read

Every major model provider has had outages. OpenAI has gone down during peak hours. Anthropic has experienced degraded performance. Google Gemini has had API issues. If your application depends on a s

Thought Leadership Data Culture

Why I stopped chasing the latest AI framework

29 Jun, 2026 | 04 Mins read

In 2023, I rewrote a data pipeline three times because the framework landscape kept shifting. First it was built on LangChain. Then the team wanted to switch to LlamaIndex because it handled retrieval

Tooling AI Infrastructure

LLM gateway comparison: LiteLLM, Portkey, Martian

29 Jun, 2026 | 07 Mins read

A production AI application calls multiple LLM providers. The primary model is GPT-4o for complex reasoning, but simple classification tasks use Claude Haiku for cost savings, and the fallback for rat

AI Infrastructure Agent Orchestration

A2A and MCP: How Agent-to-Agent Protocol Fits the Control Layer Model

28 Jun, 2026 | 09 Mins read

Google announced the Agent-to-Agent protocol, A2A, as a standard for how AI agents communicate with each other. This sits alongside the Model Context Protocol, MCP, which standardizes how agents acces

AI Enablement Operations

Your first 90 days as a Head of AI Engineering

28 Jun, 2026 | 07 Mins read

The first Head of AI Engineering at a company inherits one of three situations. Situation one: there is no AI team, no AI infrastructure, and the mandate is to build from scratch. Situation two: there

AI Infrastructure Operations

AI Rollback Patterns: When to Roll Back a Prompt, a Model, or the Whole Release

27 Jun, 2026 | 11 Mins read

Software rollbacks are well-understood. You deploy a new version, detect an issue, and roll back to the previous version. The rollback is atomic: the entire application reverts to the previous state.

Trends AI Governance

Sovereign AI: why countries are building their own models

27 Jun, 2026 | 03 Mins read

France released a fully open-source large language model trained on curated French-language data. India announced a multilingual model covering 22 scheduled languages. The UAE expanded its Falcon mode

Friday Fundamentals

Output Validation: The Quality Inspector

26 Jun, 2026 | 09 Mins read

A factory quality inspector does not make the widgets. They check the widgets that came off the line. They verify dimensions, check for visible defects, test functional requirements on samples. Their

AI Infrastructure Production Readiness

From Single-User to Multi-User: The Ten Controls You Need Before You Scale

26 Jun, 2026 | 11 Mins read

An AI application built for a single user has no tenancy concerns. The user is the user. There is no data isolation problem because there is only one data set. There is no cost attribution problem bec

« 1 ... 4 3 5 ... 30 »