QUIC for sync — choosing quinn, then patching it
The single biggest decision in frostvex's design was: what runs on the wire? Three options were on the table. TCP with a custom framing on top, like rsync. WebSocket. QUIC.
QUIC won, mostly because of two properties that mattered for our specific use case — lossy mobile networks and aggressive carrier-grade NAT. This post is about why, and about the small upstream patches I ended up needing.
What QUIC buys
The two things we couldn't easily get otherwise:
- Connection migration. If your IP address changes mid-transfer — moving between Wi-Fi and LTE, for example — QUIC keeps the connection alive across the change. TCP would have broken and required a full re-handshake.
- Multi-stream within a single connection. We can ship the manifest, the chunks, and an out-of-band control channel as separate streams that don't head-of-line block each other. With TCP that requires either multiple connections (and multiple TLS handshakes) or some kind of in-band multiplexing.
The downside is real: QUIC runs in userspace, which means more CPU than kernel TCP, and frostvex inherits some of the operational complexity of UDP (firewalls being unfriendly, NAT timeouts being short).
Why quinn
The Rust QUIC library landscape is small. The serious options are quinn and quiche. I chose quinn because:
- It's pure Rust, no C dependencies. Cross-compiling for aarch64 and Windows from one CI runner just works.
- The async story is excellent. Drops into tokio cleanly.
- The maintainers are responsive. I've filed three issues; all three got reasonable responses inside a week.
quiche would have been faster — Cloudflare runs it at scale and it shows. But the C dependencies and the more bespoke async layer pushed me toward quinn.
The patches
quinn does most things right out of the box. Three things needed work for our use case:
1. Configurable keepalive
The default keepalive is 60 seconds. On aggressive carrier-grade NAT — the kind you find on mobile networks in Asia and Eastern Europe — UDP flow timeout can be as low as 30 seconds. We needed to send keepalives more frequently.
The patch was a one-liner — expose keep_alive_interval() as a runtime configurable. Upstream PR, merged in quinn 0.10.x.
2. Per-stream priority
quinn schedules streams round-robin by default. For us, the manifest stream needs to take priority over chunk streams — if it's blocked behind a chunk transfer, sync stalls.
The patch added stream priorities to the public API. Larger; took a few rounds of review. Landed in quinn 0.11.
3. Connection migration on idle
By default, quinn migrates connection state aggressively when it sees a new path. We wanted the opposite: hold the existing path until packet loss exceeded a threshold, then migrate. Otherwise minor router blips cause unnecessary churn.
This one I've kept as a frostvex-local fork rather than upstreaming, because the policy decision is application-specific. It's about 40 lines of Rust on top of quinn::Connection.
What I'd do differently
Probably nothing. QUIC has paid off — see the benchmarks for the LTE case. The connection-migration property in particular has been worth the extra complexity.
If I were starting in 2026 instead of 2024, I'd consider s2n-quic from AWS, which has matured a lot. But I'm not going to rewrite the transport layer just to switch libraries.
Related: the [transport] section of the config docs.