Simor
Data Infrastructure for Production AI
Practical writing on AI data engineering, feature stores, and the infrastructure choices that determine whether AI systems work in production.
A SaaS company with 200 support agents and 10,000+ knowledge base articles had an 18-hour average response time and 23% first-contact resolution. Their largest enterprise client threatened to cancel a
Verifying two people are identical twins using DNA: you could sequence their entire 3 billion base pair genomes and compare every position. Or use genetic fingerprinting: hash specific DNA regions int
Thousands of children play at a beach, each leaving footprints. Tracking each child's visits individually becomes impossible at scale. Instead, imagine multiple shallow sandpits with different grid pa
Deploying AI in regulated industries—banks, insurance, healthcare—requires more than technical excellence. A model that's a black box cannot satisfy regulatory requirements for explainability. Trainin
Counting attendees at a massive festival: individual counting requires massive infrastructure for millions of attendees. Sampling small areas and extrapolating fails with uneven crowd distribution. Th
A RAG application that works perfectly with toy datasets grinds to a halt at production scale. The vector database that benchmarked beautifully with 10K vectors performs terribly at 10M. The one that
Word ladder games start with "CAT", change one letter to get "COT", then "DOT", then "DOG". Now imagine all possible words connected in a web where shared prefixes create natural pathways. That's a tr
Every team has their own definition of "revenue." The CFO calculates it one way, marketing another, and product a third. Each calculation is technically correct—they just use different definitions, ti
At a library entrance, a master directory directs you: "A-G: Left Wing, H-P: Center Hall, Q-Z: Right Wing." You head to the Right Wing where another sign says "Q-S: Aisle 1-3, T-V: Aisle 4-6." Followi