Simor

Data Infrastructure for Production AI

Practical writing on AI data engineering, feature stores, and the infrastructure choices that determine whether AI systems work in production.

Column Stores: The Vertical Filing Cabinet
Column Stores: The Vertical Filing Cabinet
30 May, 2025 | 04 Mins read

Reorganize an enormous filing cabinet. Instead of keeping complete employee records in manila folders (one folder per person with all their information), you create specialized drawers: one for all sa

CRDTs: The Cooperative Sketchpad
CRDTs: The Cooperative Sketchpad
23 May, 2025 | 04 Mins read

A magical sketchpad shared by artists around the world. Each artist has their own copy, draws whenever inspiration strikes, and somehow - without talking to each other, without a master artist coordin

Feature Store 2.0: Real-Time & Batch Unification
Feature Store 2.0: Real-Time & Batch Unification
23 May, 2025 | 07 Mins read

A fraud detection model showed 94% accuracy in development. In production Friday evening, it flagged legitimate rides as fraudulent while missing obvious fraud patterns. Investigation revealed the cau

Bank Vault Double Key
Bank Vault Double Key
16 May, 2025 | 04 Mins read

The most secure bank vault in the world requires two different keys, held by two different people, turned simultaneously. Neither person alone can open it. Now try coordinating this when the key holde

Composable Data Governance: Leveraging OpenMetadata & DataHub
Composable Data Governance: Leveraging OpenMetadata & DataHub
16 May, 2025 | 07 Mins read

Data governance fails for predictable reasons. Organizations run quarterly committee meetings while their data infrastructure changes daily. They document schemas manually while automated systems gene

Fridge Magnet Letters Arriving Late
Fridge Magnet Letters Arriving Late
09 May, 2025 | 05 Mins read

Magnetic letters on a fridge, sent between rooms with a gap under the door. You send C-A-T in order, but your friend receives A-C-T. Or worse, C-T-A. Your cat becomes an act, or something that isn't a

Data Privacy Engineering: Differential Privacy & Synthetic Data
Data Privacy Engineering: Differential Privacy & Synthetic Data
09 May, 2025 | 06 Mins read

For decades, organizations relied on anonymization—remove names, social security numbers, exact addresses—and data would be safe to share. This assumption has been shattered repeatedly. A telecommunic

The CAP Desert Triangle
The CAP Desert Triangle
02 May, 2025 | 06 Mins read

You're leading an expedition across a desert. Your team needs three things: Consistent maps (everyone has the same version), Available guides (can always get directions), and Partition tolerance (can

Automated Data Quality Gates with Great Expectations & Soda
Automated Data Quality Gates with Great Expectations & Soda
28 Apr, 2025 | 07 Mins read

Organizations often treat data quality as secondary—something to address after building pipelines and training models. This perspective misunderstands modern data systems. In a world where ML models m