Skip to main content

Teaching libuv's UDP Stack New Tricks

Teaching libuv's UDP Stack New Tricks

For the past several years I have been working on bringing native QUIC support to Node.js. One of the recurring issues has been that libuv's UDP handle -- uv_udp_t -- was designed for simple fire-and-forget datagram use cases. It gives you a buffer, a source address, and that's about it. QUIC needs significantly more from the UDP layer -- but so does any serious UDP protocol. Game servers need to know which interface a packet arrived on. Real-time media transports need congestion signals without packet loss. High-throughput services need to send and receive thousands of packets per second without drowning in system call overhead.

I recently put up a proposed patch for the libuv v2 branch that adds these capabilities. The PR is still under review and may change based on maintainer feedback, but if it lands the existing uv_udp_t handle would gain support for ECN (Explicit Congestion Notification), Path MTU Discovery, local destination address reporting (pktinfo), and GSO/GRO offloads -- all opt-in, all backward compatible with the existing API. QUIC is the immediate motivation, but these are general-purpose features that any UDP-based application could use.

Why QUIC needs more from UDP

QUIC runs over UDP, but it is not a casual UDP user. The protocol specification (RFC 9000) and its companion documents require the transport layer to support several features that traditional UDP APIs don't expose:

  • ECN -- QUIC mandates ECN validation during the handshake and uses ECN feedback for congestion control. The transport needs to read the ECN codepoint from every incoming packet and set it on every outgoing packet.

  • Path MTU Discovery -- QUIC needs to set the Don't Fragment bit and detect EMSGSIZE errors to probe the path MTU. Without this, QUIC either fragments (which UDP shouldn't do) or uses a conservative 1200-byte packet size.

  • Pktinfo -- When a server is bound to a wildcard address, it needs to know which local address and interface each packet arrived on so it can respond from the same address. QUIC connection IDs are scoped per-path, and the local address is part of the path identity.

  • GSO/GRO -- High-throughput QUIC implementations batch multiple QUIC packets into a single system call. Linux's UDP_SEGMENT (GSO) lets you hand the kernel one large buffer and a segment size; the kernel splits it into individual datagrams at the NIC level. UDP_GRO is the receive-side counterpart. Without these, each packet requires a separate sendmsg/recvmsg call.

None of these are exotic kernel features. They have been available on Linux for years, and most have equivalents on macOS, FreeBSD, and Windows. But libuv never exposed them because the original UDP API predates QUIC's standardization.

Design: extend, don't replace

The initial prototype introduced a new uv_udp2_t handle type. Feedback from the libuv maintainers was clear: don't add a new type, extend the existing one. Since uv_udp_t and the proposed uv_udp2_t had identical struct layouts -- the only difference being the callback signature -- this turned out to be straightforward.

The approach:

  1. New bind flags (UV_UDP_RECVECN, UV_UDP_PMTUD, UV_UDP_RECVPKTINFO, UV_UDP_GRO, etc.) opt in to features at bind time.
  2. uv_udp_recv_start2() registers a new-style callback that receives a uv_udp_recv_t struct containing all per-packet metadata.
  3. Internally, the callback is stored in the same recv_cb field (both are function pointers of the same size) with a flag bit controlling dispatch.
  4. The uv_udp_t struct layout does not change. Existing code is unaffected.

If you don't use the new flags or recv_start2, your UDP code works exactly as before.

The proposed APIs

Enhanced receive with uv_udp_recv_start2

The new receive callback would get a single uv_udp_recv_t struct instead of separate parameters:

typedef struct {
  ssize_t nread;
  const uv_buf_t* buf;
  const struct sockaddr* addr;       /* source address */
  struct sockaddr_storage local;     /* destination (local) address */
  unsigned int ifindex;              /* receiving interface index */
  int ecn;                           /* ECN codepoint: 0-3 */
  unsigned int flags;
  unsigned int segment_size;         /* GRO segment stride */
} uv_udp_recv_t;

The metadata fields are populated based on which flags you enabled at bind time. Fields for disabled features are zero.

ECN

ECN allows routers to signal congestion by marking packets rather than dropping them. A sender marks outgoing packets as ECN-capable (ECT(0) or ECT(1)). If a router along the path is congested, it sets the Congestion Experienced (CE) mark instead of dropping the packet. The receiver reads this mark and reports it back to the sender, which can then reduce its sending rate before packets start getting lost. This gives QUIC's congestion control earlier and more precise feedback than relying on loss detection alone.

uv_udp_bind(&handle, addr, UV_UDP_RECVECN);
uv_udp_set_ecn(&handle, 2);  /* mark outgoing packets ECT(0) */
uv_udp_recv_start2(&handle, alloc_cb, recv_cb);
 
/* In recv_cb: */
printf("ECN codepoint: %d\n", recv->ecn);
/* 0 = Not-ECT, 1 = ECT(1), 2 = ECT(0), 3 = CE */

Path MTU Discovery

Every network path has a maximum transmission unit (MTU) -- the largest packet that can traverse the path without fragmentation. The problem is that the sender doesn't know this value in advance; it depends on every link between source and destination. Path MTU Discovery works by sending packets with the Don't Fragment (DF) bit set. If a packet is too large for any link along the path, the sender gets back an EMSGSIZE error instead of silent fragmentation.

This matters for QUIC because the protocol forbids IP fragmentation entirely. QUIC's minimum packet size is 1200 bytes, but many paths support larger packets. Without PMTUD, a QUIC implementation is stuck at 1200 bytes per packet. With it, QUIC can probe upward and send larger packets, which means fewer packets per byte of data and better throughput.

uv_udp_bind(&handle, addr, UV_UDP_PMTUD);
/* DF bit is set; oversized sends fail with UV_EMSGSIZE */
 
uv_udp_set_pmtud(&handle, UV_UDP_PMTUD_PROBE);  /* default */
uv_udp_set_pmtud(&handle, UV_UDP_PMTUD_DO);      /* enforce kernel cache */
uv_udp_set_pmtud(&handle, UV_UDP_PMTUD_OFF);     /* disable */
 
/* Query the cached path MTU (connected sockets, Linux/Windows): */
size_t mtu;
uv_udp_getmtu(&handle, &mtu);

Packet info (local address + interface)

When a UDP server binds to 0.0.0.0 or :: (the wildcard address), the kernel accepts packets destined for any of the machine's IP addresses. But the standard recvfrom call only tells you who sent the packet, not which local address it was sent to or which network interface it arrived on. The IP_PKTINFO socket option (and its IPv6 counterpart IPV6_PKTINFO) fills in those missing pieces.

QUIC needs this because connection state is tied to the 4-tuple: source address, source port, destination address, destination port. If a server has multiple addresses and doesn't know which one a packet was addressed to, it can't correctly associate the packet with a connection, and it can't send responses from the right source address. This is especially important for connection migration, where a client may switch networks and the server needs to track which path each packet belongs to.

uv_udp_bind(&handle, wildcard_addr, UV_UDP_RECVPKTINFO);
 
/* In recv_cb: */
if (recv->local.ss_family == AF_INET) {
  struct sockaddr_in* a = (struct sockaddr_in*) &recv->local;
  /* a->sin_addr is the destination address the packet was sent to */
}
printf("received on interface %u\n", recv->ifindex);

GSO batch send

Sending UDP packets one at a time with sendmsg is expensive. Each call crosses the user/kernel boundary, acquires socket locks, and traverses the network stack independently. At high packet rates -- and QUIC servers routinely send thousands of packets per second -- this overhead dominates.

Generic Segmentation Offload (GSO) via the UDP_SEGMENT socket option lets you hand the kernel a single large buffer along with a segment size. The kernel (or the NIC itself, if it supports hardware offload) splits the buffer into individual datagrams. One system call, one lock acquisition, one trip through the stack -- but many packets out the wire. Paired with sendmmsg for batching multiple messages, this is how production QUIC servers achieve line-rate throughput.

uv_udp_mmsg_t msgs[2];
 
/* Each message can have its own destination and GSO segment size. */
msgs[0].bufs = &buf0;
msgs[0].nbufs = 1;
msgs[0].addr = dest;
msgs[0].gso_size = 1200;  /* kernel splits buf into 1200-byte datagrams */
msgs[0].txtime = 0;
 
msgs[1].bufs = &buf1;
msgs[1].nbufs = 1;
msgs[1].addr = dest;
msgs[1].gso_size = 0;     /* normal send, no segmentation */
msgs[1].txtime = 0;
 
int sent = uv_udp_try_send_batch(&handle, msgs, 2);

On Linux 4.18+ with gso_size > 0, the kernel does the segmentation in the NIC driver, avoiding per-packet system call overhead. On other platforms, the buffer is sent as a single datagram.

GRO receive

GRO is the receive-side counterpart to GSO. Without it, the kernel delivers each incoming UDP datagram individually, triggering a separate recvmsg call and user/kernel transition for every packet. Under high inbound packet rates this becomes the bottleneck -- the application spends more time in system calls than processing data.

With UDP_GRO enabled, the kernel coalesces consecutive datagrams from the same source that have the same size into a single "super-packet." The application gets one large buffer and a segment size indicating where the boundaries are. libuv can either split these back into individual callbacks (UV_UDP_GRO) or hand the raw coalesced buffer to the application (UV_UDP_GRO_RAW) for zero-copy processing.

uv_udp_bind(&handle, addr, UV_UDP_GRO);
uv_udp_recv_start2(&handle, alloc_cb, recv_cb);
 
/* On Linux 5.0+, the kernel may coalesce consecutive same-source datagrams.
 * libuv splits them back and calls recv_cb once per segment.
 * recv->segment_size indicates the original segment stride. */

With UV_UDP_GRO_RAW, libuv delivers the coalesced super-packet as-is and the application is responsible for splitting by segment_size.

Platform support

Not every feature is available everywhere. The API is designed so that unsupported flags are silently ignored -- the socket remains usable, you just don't get the metadata.

FeatureLinuxmacOSFreeBSDWindows
ECN (IPv4/IPv6)YesYesYesYes
PMTUDYesYesYesYes
Pktinfo (IPv4)YesYesYesYes
Pktinfo (IPv6)Yesno-opYesYes
getmtuYesENOTSUPENOTSUPYes
GSOYes (4.18+)no-opno-opno-op
GROYes (5.0+)no-opno-opno-op

The Linux-only features (GSO, GRO, SO_TXTIME) are the highest-impact for throughput, but ECN and PMTUD are the ones QUIC actually requires -- and those work cross-platform.

What's next

This PR targets the libuv v2 branch (master) and is currently under review. The API surface and implementation details may evolve based on feedback from the libuv maintainers. If it lands and libuv 2.0 ships with these features, the node:quic implementation would be able to use them directly instead of the current workarounds. The practical impact:

  • ECN: would enable proper congestion control feedback, which matters for real-world QUIC performance.
  • PMTUD: would let QUIC probe for larger packet sizes, improving throughput on paths that support jumbo frames or just exceed the conservative 1200-byte minimum.
  • Pktinfo: would fix connection migration and wildcard-address server bugs that currently require platform-specific hacks.
  • GSO/GRO: would be the biggest throughput win -- early benchmarks from other QUIC implementations show 2-5x improvement in packets-per-second when GSO is enabled.

While QUIC is the driving use case, these APIs are general-purpose. ECN benefits any congestion-aware UDP protocol -- WebRTC, RTP media streams, and custom game networking protocols can all use congestion signals to adapt bitrates without waiting for packet loss. PMTUD helps any protocol that wants to maximize packet sizes without fragmentation. Pktinfo is essential for any multi-homed server that needs to respond from the correct source address. And GSO/GRO are a straightforward throughput multiplier for anything that sends or receives UDP at scale -- DNS servers, log forwarders, metrics pipelines, game servers with high player counts.

Keep an eye on the PR. Feedback is welcome.