Simor

Data Infrastructure for Production AI

Practical writing on AI data engineering, feature stores, and the infrastructure choices that determine whether AI systems work in production.

Feature Store 2.0: Real-Time & Batch Unification
Feature Store 2.0: Real-Time & Batch Unification
23 May, 2025 | 07 Mins read

A fraud detection model showed 94% accuracy in development. In production Friday evening, it flagged legitimate rides as fraudulent while missing obvious fraud patterns. Investigation revealed the cau

Bank Vault Double Key
Bank Vault Double Key
16 May, 2025 | 04 Mins read

The most secure bank vault in the world requires two different keys, held by two different people, turned simultaneously. Neither person alone can open it. Now try coordinating this when the key holde

Composable Data Governance: Leveraging OpenMetadata & DataHub
Composable Data Governance: Leveraging OpenMetadata & DataHub
16 May, 2025 | 07 Mins read

Data governance fails for predictable reasons. Organizations run quarterly committee meetings while their data infrastructure changes daily. They document schemas manually while automated systems gene

Fridge Magnet Letters Arriving Late
Fridge Magnet Letters Arriving Late
09 May, 2025 | 05 Mins read

Magnetic letters on a fridge, sent between rooms with a gap under the door. You send C-A-T in order, but your friend receives A-C-T. Or worse, C-T-A. Your cat becomes an act, or something that isn't a

Data Privacy Engineering: Differential Privacy & Synthetic Data
Data Privacy Engineering: Differential Privacy & Synthetic Data
09 May, 2025 | 06 Mins read

For decades, organizations relied on anonymization—remove names, social security numbers, exact addresses—and data would be safe to share. This assumption has been shattered repeatedly. A telecommunic

The CAP Desert Triangle
The CAP Desert Triangle
02 May, 2025 | 06 Mins read

You're leading an expedition across a desert. Your team needs three things: Consistent maps (everyone has the same version), Available guides (can always get directions), and Partition tolerance (can

Automated Data Quality Gates with Great Expectations & Soda
Automated Data Quality Gates with Great Expectations & Soda
28 Apr, 2025 | 07 Mins read

Organizations often treat data quality as secondary—something to address after building pipelines and training models. This perspective misunderstands modern data systems. In a world where ML models m

Window Functions: The Train Car View
Window Functions: The Train Car View
25 Apr, 2025 | 05 Mins read

You're on a cross-country train, sitting by the window. As landscapes roll by, you can see not just where you are, but where you've been and where you're going. You can count how many red barns you've

Time-Travel Tables: Passport Stamp Method
Time-Travel Tables: Passport Stamp Method
18 Apr, 2025 | 04 Mins read

Open your passport and you see a story told in stamps: where you've been, when you arrived, when you left. Each stamp doesn't erase the previous ones - they accumulate, creating a complete travel hist