In search of a fast database. (sorry raft authors)

Kevin Leung
November 2025

Why I’m Building This

So I saw pogocache by tidwall - it’s this insanely fast in-memory cache that beats Redis, Memcached, basically everything. The benchmarks are ridiculous. It’s got all these cool optimizations: sharded hashmap, Robin Hood hashing, crazy-low latency.

And I thought: I want to build a cache in C too.

Around the same time, someone I know mentioned this recent VLDB 2025 paper about Local Reads and Linearizable Asynchronous Replication.

I’d been interested in distributed systems for a while (reading papers is kind of my second hobby after psych stuff), and I’d just finished working on a simple gossip cache using Hashicorp’s SWIM for a different project. This felt like a natural next step - go lower level, build something harder and more fun.

That’s how Lygus started: saw a fast cache, read a paper about Almost-Local Reads, wondered if I could combine them with Disk Paxos to build something distributed and crash-tolerant.

Almost-Local Reads

The problem with current distributed databases is that to keep things consistent (linearizable), reads usually need to coordinate with other nodes. In Raft, all reads go through the leader. In ABD-style protocols, every read hits a quorum. This coordination is expensive and kills throughput.

Almost-Local Reads fix this with a clever trick: instead of coordinating for every single read, you batch a bunch of reads together and do ONE lightweight “sync” per batch. The sync is basically a fake write (a NOOP) that proves your replica is caught up. Once that sync commits, all the batched reads execute locally against your replica’s state.

In theory, the cost of one sync gets amortized across the whole batch. If you batch 128 reads together (common in high-load scenarios), the per-read overhead becomes tiny - hence “almost-local.”

The VLDB 2025 paper actually proves something called the “LAW theorem” - that truly local linearizable reads are impossible under asynchrony and crash faults. So ALRs are literally the next best thing: not quite local, but close enough that you get massive throughput gains without sacrificing safety.

What I’m Actually Building

Using willemt/raft for consensus - the ALR paper is consensus agnostic so any algorithm works. Storage is a Write-Ahead Log (append-only, Zstd compression, CRC32C checksums, 64KB blocks) plus periodic snapshots of the whole KV store. The KV store itself is just a simple hash table with FNV-1a hashing.

For reads, I’m implementing Lazy-ALR batching: buffer incoming reads, send a NOOP_SYNC through consensus, execute reads locally once it commits. The optimization is if a real write is already happening, use that as the sync (zero extra cost).

Performance stuff: lock-free MPSC ring buffer for logging, hardware CRC32C using SSE4.2 when available.

What’s Done So Far

Built the supporting infrastructure first a lock-free logging system (MPSC ring buffer, 64-byte cache-aligned events, atomic flags per slot), hardware-accelerated CRC32C with runtime CPU detection and software fallback, basic chaining hash table for the KV store, LEB128 varint encoding for space-efficient log entries, and a comprehensive error system.

The logging system was needed for debugging distributed systems anyway. CRC32C is critical for WAL corruption detection - using SSE4.2 intrinsics (_mm_crc32_u64) on x86-64, processes 8 bytes at a time. The KV store is intentionally simple - auto-resize at 75% load, tracks memory usage, has an iterator for snapshots.

int lygus_kv_put(lygus_kv_t *kv, const void *key, size_t klen, 
                 const void *val, size_t vlen);

ssize_t lygus_kv_get(lygus_kv_t *kv, const void *key, size_t klen, 
                     void *buf, size_t blen);

Error codes cover everything from generic errors (NOMEM, INVALID_ARG) to distributed systems stuff (NO_LEADER, STALE_READ, NO_QUORUM), with helpers like lygus_strerror(), lygus_is_retryable(), lygus_is_fatal(). Of course this is merely the building blocks and support for the actual difficult parts on this project.

Next Up: The WAL

The Write-Ahead Log is what I’m working on now. The tricky part is balancing durability (every write hits disk before being acknowledged), performance (batching and compression to avoid killing throughput), and recovery (rebuild state from the log after crashes).

The simplest approach: write to WAL, fsync(), apply to in-memory state. This keeps the WAL never behind your in-memory state, which makes recovery straightforward. But there are optimizations I found such as group commit (batch multiple writes into one fsync), compression (Zstd on 64KB blocks), async apply (write and apply concurrently, but queries wait for their Log Sequence Number to be durable before responding).

Each in-memory object tracks the LSN it was last modified at, and queries maintain a max(LSN). Before sending a response, you wait for that LSN to hit disk. Standard database stuff, but I’m still figuring out the right way to implement it.

Part 2 will be about the WAL implementation once I actually build it - block framing, recovery, how it integrates with Disk Paxos, all that.

Timeline (May change with how busy finals are.)

Targeting 4-5 weeks:

Weeks 1-2: WAL implementation
Weeks 3-4: Raft integration + ALR batching
Week 5: Jepsen-style testing with Porcupine for linearizability

The plan is to open-source it once it’s done, then move on to the next project. Right now I’m just building this to learn - I’m bored and might as well do something hard.

Why This Actually Matters

Most distributed databases make tradeoffs - Raft/Paxos protocols are linearizable but slow (leader bottleneck or quorum required), eventually consistent systems are fast but weak, lease-based systems are fast but assume synchrony (breaks under network delays).

Lygus is an attempt to combine Disk Paxos + Hermes invalidations + Almost-Local Reads to get linearizable reads that are almost as fast as local reads, without assuming your network behaves perfectly. I do not expect it to go “perfect” but this is still fun to do.

Part 2 coming in a few weeks once I’ve actually implemented the WAL. You can follow along on my GitHub or email me at leungke@oregonstate.edu if you spot any mistakes or have suggestions or want to send threatening emails to.

P.S. I kinda organized Lygus to like a Go project instead of a C. (Sorry C traditionalists.)

References

In search of a fast database. (sorry raft authors)#

Why I’m Building This#

Almost-Local Reads#

What I’m Actually Building#

What’s Done So Far#

Next Up: The WAL#

Timeline (May change with how busy finals are.)#