mmap: Library Reading Room

Simor Consulting | 17 Oct, 2025 | 04 Mins read

Instead of checking out books and carrying them home, imagine a reading room where you think about page 547 of “War and Peace” and it appears before you—not a copy, but the actual page visible through enchanted glass. You read thousands of books without carrying a single one. That’s memory-mapped files (mmap): making disk storage appear as memory, letting you access file contents as variables without traditional file operation overhead.

The Traditional Library Checkout

Traditional file reading works like borrowing books:

Walk to library (open file)
Find your book (seek to position)
Check it out (read into buffer)
Carry it home (copy to memory)
Read at home (process data)
Return book (close file)

For every page you want, repeat the process. For large files, this means constant system calls, buffer management, and data copying.

The Magic Reading Room

Memory mapping transforms the experience. The OS creates a virtual memory region that maps directly to the file. You access bytes at arbitrary positions and the OS loads pages on demand.

This diagram requires JavaScript.

Enable JavaScript in your browser to use this feature.

How It Works

Call mmap() with file descriptor and length
OS creates virtual memory region mapping to file
Access any byte in the region
OS loads corresponding file pages as needed
Modifications write back to file (depending on flags)

No explicit read calls. No buffer management. Just pointer arithmetic.

Real-World Scenarios

Database Index

Traditional approach: read index block 1, search, read block 2, continue until found. Thousands of read syscalls for one query.

Memory mapped: map entire index file, search in memory, OS loads pages as accessed. The access pattern looks like sequential memory reads.

Log Analyzer

Without mmap: read chunk into buffer, process chunk, read next chunk, complex boundary handling, memory management burden.

With mmap: map 10GB log file, scan like a normal array, jump to any position instantly, OS handles paging. Simpler code, often faster execution.

Shared Dictionary

Traditional sharing: each program reads the file independently, each maintains its own copy, 10 programs consume 10x memory.

mmap sharing: map file once, all programs share the same pages, modifications visible to all processes, memory used exactly once.

Types of Mapping

Read-Only Mapping

Looking through protective glass. You can see all pages but cannot modify them. The OS can optimize aggressively since data never changes.

Private Mapping (Copy-on-Write)

Modifications go to a private copy. The original file stays unchanged. Useful when you need to process and modify without affecting the source.

Shared Mapping

All processes see the same content and modifications. True shared memory between processes. Requires coordination to avoid conflicts.

Anonymous Mapping

Memory allocation without a backing file. Useful for inter-process communication or temporary buffers that don’t need persistence.

Page Fault Mechanics

Demand Paging

You can map a 1TB file but no memory is used until you access specific pages. Access page 1000 and only that page loads. Efficient for sparse access patterns.

Read-Ahead

OS notices sequential access and pre-loads upcoming pages. You read page 100, OS loads 101-110. Reduces latency for sequential scans.

Page Eviction

When memory pressure increases, least-recently-used mapped pages are evicted. They disappear from RAM but the mapping remains. Access the page again and it reloads from disk.

Common Problems

Segmentation Fault

Accessing beyond file end causes SIGSEGV. The file has 500 pages but you try to read page 1000. Check file size before mapping and handle the mapping carefully.

Coherency Confusion

File modified outside your process while you hold it mapped. Your view becomes stale. Use file locking or explicit syncing for coordination.

32-bit Limitation

4GB address space on 32-bit systems limits how much you can map. Large files require sliding windows or 64-bit systems.

Performance Paradox

Random access causes page faults, which are expensive. Small files may have more overhead from mapping than gains. Sequential access beats random access with mmap.

Implementation

Basic Operations

Map: open(), fstat() for size, mmap() with PROT_READ or PROT_WRITE, use pointers Unmap: munmap() when done, OS handles cleanup Sync: msync() to force writes to disk, MS_SYNC or MS_ASYNC flags

Platform Differences

Linux offers rich mmap options, huge page support, and good performance. Windows uses CreateFileMapping() and MapViewOfFile() with different terminology but similar concepts. macOS uses BSD-style mmap with a unified buffer cache.

When to Use mmap

Use mmap for:

Large files with random access patterns
Shared memory between processes
Read-mostly workloads where files act like databases
When you want file access to feel like memory access

Skip mmap for:

Small files where overhead exceeds benefit
Purely sequential access where read-ahead works fine
Heavy write workloads that cause many page faults
When portability across exotic systems matters

Decision Rules

Map a file when:

You need random access to large files
Multiple processes will read the same data
You prefer memory-style access over file-style access
The OS virtual memory manager is smarter than your buffering code

Reach for read()/write() when:

Access patterns are strictly sequential
You need portable error handling
Files are small enough to buffer entirely
You lack control over page fault rates

The enchanted glass shows you the actual page. No checkout required. Every page at your fingertips.

Shipping a production AI system?

Find the control gaps before they turn into incidents. Take the AI Production Scorecard for a fast baseline across the seven layers, or book an architecture review and we will turn it into a hardening plan.

Take the AI Production Scorecard Book an Architecture Review

This comment section requires JavaScript.

Enable JavaScript in your browser to use this feature.