The Linux 6.19 merge window brings several meaningful updates to the fast-growing IO_uring subsystem. Developers focused on both performance and maintainability this cycle. As a result, IO_uring now includes extensive code clean-ups that streamline the subsystem and reduce long-term technical debt. Additionally, support for mixed-size SQEs (Submission Queue Entries) has been added. This update improves flexibility for developers who depend on IO_uring for high-performance asynchronous I/O.
Moreover, IO_uring now offers zero-copy receive (ZCRX) updates, allowing applications to reduce memory overhead and lower data movement costs. This change directly benefits high-throughput workloads such as networking servers. The subsystem also gained improved ring initialization, making IO_uring setups more predictable and efficient. Furthermore, support for getsockname() and getpeername() expands IO_uring’s networking capabilities, allowing developers to query socket information without blocking traditional system calls.
NVIDIA-Driven Enhancements to Block Layer P2P DMA
Moving to the block subsystem, Linux 6.19 introduces a significant set of improvements focused on peer-to-peer (P2P) DMA support. This work was primarily driven by NVIDIA engineer Leon Romanovsky, who addressed long-standing issues around MMIO memory handling in P2P DMA transfers involving block devices.
Previously, P2P DMA transfers routed through the host bridge were not marked correctly as MMIO memory. This caused several issues, including:
Incorrect or unnecessary CPU cache synchronization
Improper DMA mapping and unmapping operations
Missing IOMMU setup for MMIO-based memory regions
Linux 6.19 now resolves these flaws by enhancing MMIO memory recognition and improving P2P DMA mappings across both the block layer and NVMe drivers. This update is crucial for high-performance storage solutions, especially where GPUs or other accelerators interact directly with NVMe devices.
Additional Block Layer Improvements
Along with P2P DMA enhancements, the Linux 6.19 block subsystem includes several other noteworthy updates. The auto-integrity code has been improved, making integrity checking more reliable on modern storage devices. In addition, the kernel now speeds up polled I/O handling, which benefits latency-critical workloads.
The release also fixes blk-throttle issues for SSDs, ensuring more accurate I/O control. Support for caching zones, several MD (multiple device) fixes, and better use of per-CPU workqueues in Bcache contribute to overall stability and performance. Furthermore, the block tracing infrastructure now supports zoned devices, offering deeper visibility into device behavior.

