CVE-2026-46176: Linux mlx5 RDMA: use-after-free in SRQ init path
AWAITING NVDA use-after-free and double-free bug in the Linux kernel's mlx5 RDMA driver corrupts kernel memory when Shared Receive Queue initialization fails during device setup, potentially enabling local privilege escalation on affected hosts. The primary AI/ML exposure is distributed training infrastructure running Mellanox/NVIDIA ConnectX adapters for high-speed GPU interconnects over InfiniBand or RoCE—the backbone of most large-scale model training clusters. No CVSS score has been assigned, no public exploit exists, and the vulnerability is absent from CISA KEV, placing immediate risk as low; exploitation requires local access and advanced kernel heap manipulation skills. Apply the stable kernel patches referenced in the advisory when your distribution backports them, and audit which training nodes run mlx5-based RDMA to scope the patching inventory.
What is the risk?
Low immediate risk given the absence of a published CVSS score, no known public exploits, and no active exploitation in the wild. The vulnerable code path is in the RDMA device initialization error branch, meaning the trigger window is narrow—limited to driver load or adapter reset events under specific failure conditions. Exploitation requires local access to a host with a Mellanox/NVIDIA mlx5 NIC, ability to force the s1 SRQ allocation failure, and advanced kernel heap-shaping knowledge to convert memory corruption into code execution. Risk escalates in multi-tenant shared GPU cluster environments where multiple workloads share kernel space, as a privileged container could attempt to exploit this to escape to host.
Attack Kill Chain
Severity & Risk
What should I do?
5 steps-
Apply the kernel stable patches from git commits referenced in the CVE advisory (6fd93142dd1d, a13c2ac4d480, b087913ae882, bc2cf5935b46, c488df06bd55).
-
Monitor vendor security advisories for distribution-specific backports: Red Hat RHSA, Ubuntu USN, SUSE SLES, Debian DSA.
-
If RDMA is not actively required on a node, blacklist the mlx5_ib kernel module by adding 'blacklist mlx5_ib' to /etc/modprobe.d and rebooting.
-
Prioritize patching multi-tenant GPU training nodes where untrusted workloads run alongside sensitive AI pipelines.
-
Enable kdump/crash reporting to detect exploitation attempts via unexpected kernel oops events on RDMA-equipped hosts.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2026-46176?
A use-after-free and double-free bug in the Linux kernel's mlx5 RDMA driver corrupts kernel memory when Shared Receive Queue initialization fails during device setup, potentially enabling local privilege escalation on affected hosts. The primary AI/ML exposure is distributed training infrastructure running Mellanox/NVIDIA ConnectX adapters for high-speed GPU interconnects over InfiniBand or RoCE—the backbone of most large-scale model training clusters. No CVSS score has been assigned, no public exploit exists, and the vulnerability is absent from CISA KEV, placing immediate risk as low; exploitation requires local access and advanced kernel heap manipulation skills. Apply the stable kernel patches referenced in the advisory when your distribution backports them, and audit which training nodes run mlx5-based RDMA to scope the patching inventory.
Is CVE-2026-46176 actively exploited?
No confirmed active exploitation of CVE-2026-46176 has been reported, but organizations should still patch proactively.
How to fix CVE-2026-46176?
1. Apply the kernel stable patches from git commits referenced in the CVE advisory (6fd93142dd1d, a13c2ac4d480, b087913ae882, bc2cf5935b46, c488df06bd55). 2. Monitor vendor security advisories for distribution-specific backports: Red Hat RHSA, Ubuntu USN, SUSE SLES, Debian DSA. 3. If RDMA is not actively required on a node, blacklist the mlx5_ib kernel module by adding 'blacklist mlx5_ib' to /etc/modprobe.d and rebooting. 4. Prioritize patching multi-tenant GPU training nodes where untrusted workloads run alongside sensitive AI pipelines. 5. Enable kdump/crash reporting to detect exploitation attempts via unexpected kernel oops events on RDMA-equipped hosts.
What systems are affected by CVE-2026-46176?
This vulnerability affects the following AI/ML architecture patterns: distributed AI training clusters with RDMA networking, GPU compute infrastructure using InfiniBand or RoCE, model serving nodes with Mellanox/NVIDIA ConnectX adapters.
What is the CVSS score for CVE-2026-46176?
No CVSS score has been assigned yet.
AI Security Impact
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0105 Escape to Host AML.T0112 Machine Compromise Compliance Controls Affected
Technical Details
Original Advisory
In the Linux kernel, the following vulnerability has been resolved: RDMA/mlx5: Fix error path fall-through in mlx5_ib_dev_res_srq_init() mlx5_ib_dev_res_srq_init() allocates two SRQs, s0 and s1. When ib_create_srq() fails for s1, the error branch destroys s0 but falls through and unconditionally assigns the freed s0 and the ERR_PTR s1 to devr->s0 and devr->s1. This leads to several problems: the lock-free fast path checks "if (devr->s1) return 0;" and treats the ERR_PTR as already initialised; users in mlx5_ib_create_qp() dereference the freed SRQ or ERR_PTR via to_msrq(devr->s0)->msrq.srqn; and mlx5_ib_dev_res_cleanup() dereferences the ERR_PTR and double-frees s0 on teardown. Fix by adding the same `goto unlock` in the s1 failure path.
Exploitation Scenario
A local attacker with unprivileged access to a shared GPU training node intentionally triggers memory pressure or a resource exhaustion condition during mlx5 RDMA adapter initialization—causing the ib_create_srq() call for s1 to fail. The vulnerable error path destroys s0 but falls through, writing the freed s0 pointer and an ERR_PTR for s1 into device-global devr slots. The fast-path check misreads the ERR_PTR as an initialized structure and continues, leading to a dereference of the freed SRQ in mlx5_ib_create_qp(). The attacker leverages slab heap grooming to position a controlled allocation at the freed s0 address, achieving kernel write primitives, escalating to root, and escaping any container isolation protecting co-resident AI training workloads.
References
- git.kernel.org/stable/c/6fd93142dd1d09000c3750af08270f5792523fe9
- git.kernel.org/stable/c/a13c2ac4d480b734342c6fbf8249fc48afd675f3
- git.kernel.org/stable/c/b087913ae88256df66620f7ba0a9776716aeef7e
- git.kernel.org/stable/c/bc2cf5935b4665172235341163315905197ae91d
- git.kernel.org/stable/c/c488df06bd552bb8b6e14fa0cfd5ad986c6e9525
Timeline
Related Vulnerabilities
CVE-2024-2912 10.0 BentoML: RCE via insecure deserialization (CVSS 10)
Same attack type: Code Execution CVE-2026-21858 10.0 n8n: Input Validation flaw enables exploitation
Same attack type: Code Execution CVE-2025-5120 10.0 smolagents: sandbox escape enables unauthenticated RCE
Same attack type: Code Execution CVE-2025-59528 10.0 Flowise: Unauthenticated RCE via MCP config injection
Same attack type: Code Execution GHSA-vvpj-8cmc-gx39 10.0 picklescan: security flaw enables exploitation
Same attack type: Code Execution