Simor

Data Infrastructure for Production AI

Practical writing on AI data engineering, feature stores, and the infrastructure choices that determine whether AI systems work in production.

Data Privacy Engineering: Differential Privacy & Synthetic Data
Data Privacy Engineering: Differential Privacy & Synthetic Data
09 May, 2025 | 06 Mins read

For decades, organizations relied on anonymization—remove names, social security numbers, exact addresses—and data would be safe to share. This assumption has been shattered repeatedly. A telecommunic

The CAP Desert Triangle
The CAP Desert Triangle
02 May, 2025 | 06 Mins read

You're leading an expedition across a desert. Your team needs three things: Consistent maps (everyone has the same version), Available guides (can always get directions), and Partition tolerance (can

Automated Data Quality Gates with Great Expectations & Soda
Automated Data Quality Gates with Great Expectations & Soda
28 Apr, 2025 | 07 Mins read

Organizations often treat data quality as secondary—something to address after building pipelines and training models. This perspective misunderstands modern data systems. In a world where ML models m

Window Functions: The Train Car View
Window Functions: The Train Car View
25 Apr, 2025 | 05 Mins read

You're on a cross-country train, sitting by the window. As landscapes roll by, you can see not just where you are, but where you've been and where you're going. You can count how many red barns you've

Time-Travel Tables: Passport Stamp Method
Time-Travel Tables: Passport Stamp Method
18 Apr, 2025 | 04 Mins read

Open your passport and you see a story told in stamps: where you've been, when you arrived, when you left. Each stamp doesn't erase the previous ones - they accumulate, creating a complete travel hist

Retrieval-Augmented Generation at Scale: Designing the RAG Pipeline
Retrieval-Augmented Generation at Scale: Designing the RAG Pipeline
17 Apr, 2025 | 07 Mins read

Large language models suffer from a critical flaw: their knowledge is frozen at training time, encoded implicitly in billions of parameters, and prone to confident fabrication. This limitation becomes

Idempotency: Vending Machine Coin Trick
Idempotency: Vending Machine Coin Trick
11 Apr, 2025 | 03 Mins read

You're at a vending machine, desperately needing caffeine. You insert a dollar, press B4 for coffee, but nothing happens. Did the machine eat your money? Did it register the button press? In frustrati

LLM Prompt Engineering Frameworks: Patterns for Enterprise Apps
LLM Prompt Engineering Frameworks: Patterns for Enterprise Apps
06 Apr, 2025 | 09 Mins read

Large language models shattered the deterministic paradigm of traditional software. The same prompt can produce different outputs. Model behavior emerges from billions of parameters trained on vast te

Seek > Offset: Airline Boarding Pass Analogy
Seek > Offset: Airline Boarding Pass Analogy
04 Apr, 2025 | 03 Mins read

Picture yourself at a busy airport gate. The agent announces: "We'll now board passengers in rows 20 through 30." Simple, efficient, everyone knows whether it's their turn. Now imagine instead they sa