Simor

Data Infrastructure for Production AI

Practical writing on AI data engineering, feature stores, and the infrastructure choices that determine whether AI systems work in production.

Semantic Caching for AI: Reducing Latency and Cost with Meaning-Based Retrieval
Semantic Caching for AI: Reducing Latency and Cost with Meaning-Based Retrieval
19 May, 2026 | 07 Mins read

Every repeated question your AI system answers is money spent and latency incurred that you did not need to. If a thousand users ask the same question in a week, running it through the language model

The ethics of training on copyrighted data — a nuanced take
The ethics of training on copyrighted data — a nuanced take
18 May, 2026 | 05 Mins read

The legal system has not caught up with the practice of training AI models on copyrighted data, and the people building AI systems are not waiting for it. Models trained on books, articles, code repos

The data quality scorecard: metrics that actually matter
The data quality scorecard: metrics that actually matter
17 May, 2026 | 06 Mins read

Most data quality initiatives fail not because teams lack tools, but because they measure the wrong things. Teams track hundreds of data quality metrics, generate dashboards full of green indicators,

The A2A protocol and what it means for enterprise AI
The A2A protocol and what it means for enterprise AI
16 May, 2026 | 03 Mins read

Google published the Agent-to-Agent (A2A) protocol specification in late 2025 and, as of this quarter, has secured endorsement from over fifty technology companies including Salesforce, SAP, ServiceNo

Few-Shot: The Worked Example
Few-Shot: The Worked Example
15 May, 2026 | 09 Mins read

You learned to solve quadratic equations from a textbook. The textbook did not just define the formula. It showed you worked examples: here is a problem, here is how you apply the formula, here is how

LLM evaluation platforms compared: LangSmith, Braintrust, Patronus
LLM evaluation platforms compared: LangSmith, Braintrust, Patronus
14 May, 2026 | 05 Mins read

Building an LLM application is the easy part. Knowing whether it works — whether it still works after you change a prompt, swap a model, or add a tool — is the hard part. LLM evaluation platforms exis

Building a data-driven culture: lessons from 50 engagements
Building a data-driven culture: lessons from 50 engagements
13 May, 2026 | 05 Mins read

The phrase "data-driven culture" has been emptied of meaning by overuse. It appears in every strategy deck, every job posting, every conference talk. Everyone claims to want it. Almost no one can desc

The vector database that couldn't scale — and what we did instead
The vector database that couldn't scale — and what we did instead
12 May, 2026 | 05 Mins read

A media company with a library of twelve million articles, transcripts, and research documents had built a semantic search system on a managed vector database. The system was designed to let journalis

The AI Data Pipeline: Special Considerations for Unstructured and Structured Data
The AI Data Pipeline: Special Considerations for Unstructured and Structured Data
11 May, 2026 | 13 Mins read

Data pipelines for AI are not the same as data pipelines for traditional software systems. The outputs are different. The failure modes are different. The tolerance for data quality issues is differen