Simor
Data Infrastructure for Production AI
Practical writing on AI data engineering, feature stores, and the infrastructure choices that determine whether AI systems work in production.
Reorganize an enormous filing cabinet. Instead of keeping complete employee records in manila folders (one folder per person with all their information), you create specialized drawers: one for all sa
A magical sketchpad shared by artists around the world. Each artist has their own copy, draws whenever inspiration strikes, and somehow - without talking to each other, without a master artist coordin
A fraud detection model showed 94% accuracy in development. In production Friday evening, it flagged legitimate rides as fraudulent while missing obvious fraud patterns. Investigation revealed the cau
The most secure bank vault in the world requires two different keys, held by two different people, turned simultaneously. Neither person alone can open it. Now try coordinating this when the key holde
Data governance fails for predictable reasons. Organizations run quarterly committee meetings while their data infrastructure changes daily. They document schemas manually while automated systems gene
Magnetic letters on a fridge, sent between rooms with a gap under the door. You send C-A-T in order, but your friend receives A-C-T. Or worse, C-T-A. Your cat becomes an act, or something that isn't a
For decades, organizations relied on anonymization—remove names, social security numbers, exact addresses—and data would be safe to share. This assumption has been shattered repeatedly. A telecommunic
You're leading an expedition across a desert. Your team needs three things: Consistent maps (everyone has the same version), Available guides (can always get directions), and Partition tolerance (can
Organizations often treat data quality as secondary—something to address after building pipelines and training models. This perspective misunderstands modern data systems. In a world where ML models m