Simor
Data Infrastructure for Production AI
Practical writing on AI data engineering, feature stores, and the infrastructure choices that determine whether AI systems work in production.
Counting attendees at a massive festival: individual counting requires massive infrastructure for millions of attendees. Sampling small areas and extrapolating fails with uneven crowd distribution. Th
A RAG application that works perfectly with toy datasets grinds to a halt at production scale. The vector database that benchmarked beautifully with 10K vectors performs terribly at 10M. The one that
Word ladder games start with "CAT", change one letter to get "COT", then "DOT", then "DOG". Now imagine all possible words connected in a web where shared prefixes create natural pathways. That's a tr
Every team has their own definition of "revenue." The CFO calculates it one way, marketing another, and product a third. Each calculation is technically correct—they just use different definitions, ti
At a library entrance, a master directory directs you: "A-G: Left Wing, H-P: Center Hall, Q-Z: Right Wing." You head to the Right Wing where another sign says "Q-S: Aisle 1-3, T-V: Aisle 4-6." Followi
Picture a pizza shop on Friday night. Method one: single pizza cutter, cut one line at a time, eight cuts for eight slices. Method two: eight pizza cutters attached to one handle, perfect spacing, one
Human communication is multimodal: we gesture while speaking, draw diagrams while explaining, and understand meaning through the interplay of sensory inputs. Yet most AI systems operate in silos—compu
Instead of checking out books and carrying them home, imagine a reading room where you think about page 547 of "War and Peace" and it appears before you—not a copy, but the actual page visible through
At a family dinner, Grandma wants to pass mashed potatoes to Cousin Jim across the table. The inefficient approach: Grandma scoops potatoes onto her plate, passes to Uncle Bob, who scoops onto his pla