Reed-Solomon parity for backup pools
Frostvex stores chunks with parity blocks — Reed-Solomon error-correcting codes — so that if a chunk goes bad on one peer, the receiver can reconstruct it from another peer's surviving copies plus the parity. This post is the short version of how I picked the parameters.
The scheme
Reed-Solomon is parameterized by (k, m) — k data shards and m parity shards per stripe. Any k shards out of the k+m total are enough to reconstruct the original. Three configurations were under consideration:
| scheme | data + parity | storage overhead | survives loss of |
|---|---|---|---|
| 4+2 | 4 data, 2 parity | 50% | 2 chunks per stripe |
| 6+3 | 6 data, 3 parity | 50% | 3 chunks per stripe |
| 8+4 | 8 data, 4 parity | 50% | 4 chunks per stripe |
All three have the same storage overhead. The differences are in encode/decode CPU cost and fault tolerance.
Encode/decode benchmark
Per-chunk encode timing on a 2023 laptop, single core, 128 KB chunk:
| scheme | encode | decode (1 missing) | decode (k missing) |
|---|---|---|---|
| 4+2 | 22 µs | 31 µs | 52 µs |
| 6+3 | 34 µs | 48 µs | 92 µs |
| 8+4 | 52 µs | 71 µs | 140 µs |
The encode cost scales roughly linearly with stripe size; decode cost scales worse, especially when many shards are missing.
For frostvex's workload, encode is what matters — every chunk goes through encode on the source. Decode only fires when something has gone wrong. So the absolute cost is bounded by encode, and 8+4 at 52 µs/chunk on a 128 KB chunk is around 2.5 GB/s of throughput. We're never going to be CPU-bound on the encode path.
Why 8+4 won
Two reasons.
The first is a property of large stripes: the larger the stripe, the more independent the failures need to be to lose data. With 4+2, two correlated bit errors in the same stripe is enough. With 8+4, you need four. Real failure modes — a NIC glitch, a write to a bad disk sector — tend to cluster, and 8+4 gives more headroom against clustered failures.
The second is more practical: with our default chunk size of 128 KB, an 8+4 stripe is 1.5 MB. That fits cleanly in a single QUIC stream without fragmentation. 4+2 is 0.75 MB, which is also fine, but more stripes means more bookkeeping per file.
Customizing
Since 0.3, parity is configurable per-pool:
[parity]
scheme = "reed-solomon"
data = 8
parity = 4
chunk_size_kb = 128
If you only care about detection — not correction — set parity = 0. You'll still get BLAKE3 hashes catching corruption, but you won't be able to repair from them. Useful when storage is precious and you have other backup paths.
Related: the [parity] section in configuration docs.