Simor
Data Infrastructure for Production AI
Practical writing on AI data engineering, feature stores, and the infrastructure choices that determine whether AI systems work in production.
For decades, organizations relied on anonymization—remove names, social security numbers, exact addresses—and data would be safe to share. This assumption has been shattered repeatedly. A telecommunic
You're leading an expedition across a desert. Your team needs three things: Consistent maps (everyone has the same version), Available guides (can always get directions), and Partition tolerance (can
Organizations often treat data quality as secondary—something to address after building pipelines and training models. This perspective misunderstands modern data systems. In a world where ML models m
You're on a cross-country train, sitting by the window. As landscapes roll by, you can see not just where you are, but where you've been and where you're going. You can count how many red barns you've
Open your passport and you see a story told in stamps: where you've been, when you arrived, when you left. Each stamp doesn't erase the previous ones - they accumulate, creating a complete travel hist
Large language models suffer from a critical flaw: their knowledge is frozen at training time, encoded implicitly in billions of parameters, and prone to confident fabrication. This limitation becomes
You're at a vending machine, desperately needing caffeine. You insert a dollar, press B4 for coffee, but nothing happens. Did the machine eat your money? Did it register the button press? In frustrati
Large language models shattered the deterministic paradigm of traditional software. The same prompt can produce different outputs. Model behavior emerges from billions of parameters trained on vast te
Picture yourself at a busy airport gate. The agent announces: "We'll now board passengers in rows 20 through 30." Simple, efficient, everyone knows whether it's their turn. Now imagine instead they sa