Simor

Data Infrastructure for Production AI

Practical writing on AI data engineering, feature stores, and the infrastructure choices that determine whether AI systems work in production.

Window Functions: The Train Car View
Window Functions: The Train Car View
25 Apr, 2025 | 05 Mins read

You're on a cross-country train, sitting by the window. As landscapes roll by, you can see not just where you are, but where you've been and where you're going. You can count how many red barns you've

Time-Travel Tables: Passport Stamp Method
Time-Travel Tables: Passport Stamp Method
18 Apr, 2025 | 04 Mins read

Open your passport and you see a story told in stamps: where you've been, when you arrived, when you left. Each stamp doesn't erase the previous ones - they accumulate, creating a complete travel hist

Retrieval-Augmented Generation at Scale: Designing the RAG Pipeline
Retrieval-Augmented Generation at Scale: Designing the RAG Pipeline
17 Apr, 2025 | 07 Mins read

Large language models suffer from a critical flaw: their knowledge is frozen at training time, encoded implicitly in billions of parameters, and prone to confident fabrication. This limitation becomes

Idempotency: Vending Machine Coin Trick
Idempotency: Vending Machine Coin Trick
11 Apr, 2025 | 03 Mins read

You're at a vending machine, desperately needing caffeine. You insert a dollar, press B4 for coffee, but nothing happens. Did the machine eat your money? Did it register the button press? In frustrati

LLM Prompt Engineering Frameworks: Patterns for Enterprise Apps
LLM Prompt Engineering Frameworks: Patterns for Enterprise Apps
06 Apr, 2025 | 09 Mins read

Large language models shattered the deterministic paradigm of traditional software. The same prompt can produce different outputs. Model behavior emerges from billions of parameters trained on vast te

Seek > Offset: Airline Boarding Pass Analogy
Seek > Offset: Airline Boarding Pass Analogy
04 Apr, 2025 | 03 Mins read

Picture yourself at a busy airport gate. The agent announces: "We'll now board passengers in rows 20 through 30." Simple, efficient, everyone knows whether it's their turn. Now imagine instead they sa

Bloom Filters: The Forgetful Bouncer
Bloom Filters: The Forgetful Bouncer
28 Mar, 2025 | 06 Mins read

A nightclub bouncer with a peculiar condition: they never forget a face they've seen, but sometimes they think they've seen faces they haven't. When someone approaches, they'll either say "You've defi

Responsible AI by Design: Embedding Ethics into Data Architecture
Responsible AI by Design: Embedding Ethics into Data Architecture
26 Mar, 2025 | 09 Mins read

AI systems increasingly make decisions that profoundly affect human lives. Healthcare systems deny treatment recommendations based on zip codes. Hiring platforms filter resumes based on gender. Crimin

Tracing Spans as Russian Nesting Dolls
Tracing Spans as Russian Nesting Dolls
21 Mar, 2025 | 03 Mins read

Russian nesting dolls (Matryoshka) are wooden dolls where each one opens to reveal a smaller doll inside, which opens to reveal another, and so on. Each doll represents an operation in your distribute