Simor

Data Infrastructure for Production AI

Practical writing on AI data engineering, feature stores, and the infrastructure choices that determine whether AI systems work in production.

Case Study: End-to-End RAG Platform for Customer Support
Case Study: End-to-End RAG Platform for Customer Support
05 Dec, 2025 | 05 Mins read

A SaaS company with 200 support agents and 10,000+ knowledge base articles had an 18-hour average response time and 23% first-contact resolution. Their largest enterprise client threatened to cancel a

Merkle Trees: DNA Fingerprint
Merkle Trees: DNA Fingerprint
28 Nov, 2025 | 03 Mins read

Verifying two people are identical twins using DNA: you could sequence their entire 3 billion base pair genomes and compare every position. Or use genetic fingerprinting: hash specific DNA regions int

Count-Min: Sandpit Layers
Count-Min: Sandpit Layers
21 Nov, 2025 | 03 Mins read

Thousands of children play at a beach, each leaving footprints. Tracking each child's visits individually becomes impossible at scale. Instead, imagine multiple shallow sandpits with different grid pa

AI in Regulated Industries: Compliance Patterns for Finance & Healthcare
AI in Regulated Industries: Compliance Patterns for Finance & Healthcare
21 Nov, 2025 | 04 Mins read

Deploying AI in regulated industries—banks, insurance, healthcare—requires more than technical excellence. A model that's a black box cannot satisfy regulatory requirements for explainability. Trainin

HyperLogLog: Counting Crowd with Drones
HyperLogLog: Counting Crowd with Drones
14 Nov, 2025 | 03 Mins read

Counting attendees at a massive festival: individual counting requires massive infrastructure for millions of attendees. Sampling small areas and extrapolating fails with uneven crowd distribution. Th

Benchmarking Vector Databases: Performance, Cost & Ecosystem
Benchmarking Vector Databases: Performance, Cost & Ecosystem
14 Nov, 2025 | 05 Mins read

A RAG application that works perfectly with toy datasets grinds to a halt at production scale. The vector database that benchmarked beautifully with 10K vectors performs terribly at 10M. The one that

Tries: The Word Ladder
Tries: The Word Ladder
07 Nov, 2025 | 03 Mins read

Word ladder games start with "CAT", change one letter to get "COT", then "DOT", then "DOG". Now imagine all possible words connected in a web where shared prefixes create natural pathways. That's a tr

Semantic Layers & Metrics Stores: dbt Semantic Layer, Cube, Transform
Semantic Layers & Metrics Stores: dbt Semantic Layer, Cube, Transform
07 Nov, 2025 | 05 Mins read

Every team has their own definition of "revenue." The CFO calculates it one way, marketing another, and product a third. Each calculation is technically correct—they just use different definitions, ti

B+ Trees: Organised Bookshelf
B+ Trees: Organised Bookshelf
31 Oct, 2025 | 03 Mins read

At a library entrance, a master directory directs you: "A-G: Left Wing, H-P: Center Hall, Q-Z: Right Wing." You head to the Right Wing where another sign says "Q-S: Aisle 1-3, T-V: Aisle 4-6." Followi