CVE-2026-46176: use-after-free in SRQ init path

CISO Take

A use-after-free and double-free bug in the Linux kernel's mlx5 RDMA driver corrupts kernel memory when Shared Receive Queue initialization fails during device setup, potentially enabling local privilege escalation on affected hosts. The primary AI/ML exposure is distributed training infrastructure running Mellanox/NVIDIA ConnectX adapters for high-speed GPU interconnects over InfiniBand or RoCE—the backbone of most large-scale model training clusters. No CVSS score has been assigned, no public exploit exists, and the vulnerability is absent from CISA KEV, placing immediate risk as low; exploitation requires local access and advanced kernel heap manipulation skills. Apply the stable kernel patches referenced in the advisory when your distribution backports them, and audit which training nodes run mlx5-based RDMA to scope the patching inventory.

Sources: NVD ATLAS

What is the risk?

Low immediate risk given the absence of a published CVSS score, no known public exploits, and no active exploitation in the wild. The vulnerable code path is in the RDMA device initialization error branch, meaning the trigger window is narrow—limited to driver load or adapter reset events under specific failure conditions. Exploitation requires local access to a host with a Mellanox/NVIDIA mlx5 NIC, ability to force the s1 SRQ allocation failure, and advanced kernel heap-shaping knowledge to convert memory corruption into code execution. Risk escalates in multi-tenant shared GPU cluster environments where multiple workloads share kernel space, as a privileged container could attempt to exploit this to escape to host.

How does the attack unfold?

Local Access

Attacker obtains legitimate or compromised local user access to a Linux host with Mellanox/NVIDIA mlx5 RDMA adapters, such as a shared GPU training node.

AML.T0012

Error Path Trigger

Attacker induces a failure in the s1 SRQ allocation during mlx5_ib driver initialization (e.g., via memory pressure), activating the vulnerable fall-through code path that assigns freed and ERR_PTR values to device-global slots.

Kernel Heap Corruption

Fast-path code treats the ERR_PTR as a valid initialized structure; mlx5_ib_create_qp() dereferences the freed s0 SRQ, and teardown double-frees it—corrupting kernel heap memory.

Privilege Escalation

Attacker leverages heap corruption primitives to escalate to kernel code execution, achieving root privileges—enabling container escape, disruption of AI training jobs, or exfiltration of model weights and credentials from co-resident workloads.

AML.T0112

Local Access

Attacker obtains legitimate or compromised local user access to a Linux host with Mellanox/NVIDIA mlx5 RDMA adapters, such as a shared GPU training node.

AML.T0012

Error Path Trigger

Attacker induces a failure in the s1 SRQ allocation during mlx5_ib driver initialization (e.g., via memory pressure), activating the vulnerable fall-through code path that assigns freed and ERR_PTR values to device-global slots.

Kernel Heap Corruption

Fast-path code treats the ERR_PTR as a valid initialized structure; mlx5_ib_create_qp() dereferences the freed s0 SRQ, and teardown double-frees it—corrupting kernel heap memory.

Privilege Escalation

Attacker leverages heap corruption primitives to escalate to kernel code execution, achieving root privileges—enabling container escape, disruption of AI training jobs, or exfiltration of model weights and credentials from co-resident workloads.

AML.T0112

How severe is it?

CVSS 3.1

7.8 / 10

EPSS

0.1%

chance of exploitation in 30 days

Higher than 4% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

No known exploitation

Sophistication

Advanced

What is the attack surface?

AV Local

AC Low

PR Low

UI None

S Unchanged

C High

I High

A High

What should I do?

5 steps

Apply the kernel stable patches from git commits referenced in the CVE advisory (6fd93142dd1d, a13c2ac4d480, b087913ae882, bc2cf5935b46, c488df06bd55).
Monitor vendor security advisories for distribution-specific backports: Red Hat RHSA, Ubuntu USN, SUSE SLES, Debian DSA.
If RDMA is not actively required on a node, blacklist the mlx5_ib kernel module by adding 'blacklist mlx5_ib' to /etc/modprobe.d and rebooting.
Prioritize patching multi-tenant GPU training nodes where untrusted workloads run alongside sensitive AI pipelines.
Enable kdump/crash reporting to detect exploitation attempts via unexpected kernel oops events on RDMA-equipped hosts.

How is it classified?

Code Execution Inference Framework AML.T0105 - Escape to Host AML.T0112 - Machine Compromise

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Art. 9 - Risk Management System

ISO 42001

A.8.1 - Technical Measures for AI Systems

NIST AI RMF

MANAGE-2.2 - Risk Treatment

Frequently Asked Questions

What is CVE-2026-46176?

A use-after-free and double-free bug in the Linux kernel's mlx5 RDMA driver corrupts kernel memory when Shared Receive Queue initialization fails during device setup, potentially enabling local privilege escalation on affected hosts. The primary AI/ML exposure is distributed training infrastructure running Mellanox/NVIDIA ConnectX adapters for high-speed GPU interconnects over InfiniBand or RoCE—the backbone of most large-scale model training clusters. No CVSS score has been assigned, no public exploit exists, and the vulnerability is absent from CISA KEV, placing immediate risk as low; exploitation requires local access and advanced kernel heap manipulation skills. Apply the stable kernel patches referenced in the advisory when your distribution backports them, and audit which training nodes run mlx5-based RDMA to scope the patching inventory.

Is CVE-2026-46176 actively exploited?

No confirmed active exploitation of CVE-2026-46176 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-46176?

1. Apply the kernel stable patches from git commits referenced in the CVE advisory (6fd93142dd1d, a13c2ac4d480, b087913ae882, bc2cf5935b46, c488df06bd55). 2. Monitor vendor security advisories for distribution-specific backports: Red Hat RHSA, Ubuntu USN, SUSE SLES, Debian DSA. 3. If RDMA is not actively required on a node, blacklist the mlx5_ib kernel module by adding 'blacklist mlx5_ib' to /etc/modprobe.d and rebooting. 4. Prioritize patching multi-tenant GPU training nodes where untrusted workloads run alongside sensitive AI pipelines. 5. Enable kdump/crash reporting to detect exploitation attempts via unexpected kernel oops events on RDMA-equipped hosts.

What systems are affected by CVE-2026-46176?

This vulnerability affects the following AI/ML architecture patterns: distributed AI training clusters with RDMA networking, GPU compute infrastructure using InfiniBand or RoCE, model serving nodes with Mellanox/NVIDIA ConnectX adapters.

What is the CVSS score for CVE-2026-46176?

CVE-2026-46176 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.14%.

What is the AI security impact?

Affected AI Architectures

distributed AI training clusters with RDMA networkingGPU compute infrastructure using InfiniBand or RoCEmodel serving nodes with Mellanox/NVIDIA ConnectX adapters

MITRE ATLAS Techniques

AML.T0105 Escape to Host

AML.T0112 Machine Compromise

Compliance Controls Affected

EU AI Act: Art. 9

ISO 42001: A.8.1

NIST AI RMF: MANAGE-2.2

What are the technical details?

Original Advisory

In the Linux kernel, the following vulnerability has been resolved: RDMA/mlx5: Fix error path fall-through in mlx5_ib_dev_res_srq_init() mlx5_ib_dev_res_srq_init() allocates two SRQs, s0 and s1. When ib_create_srq() fails for s1, the error branch destroys s0 but falls through and unconditionally assigns the freed s0 and the ERR_PTR s1 to devr->s0 and devr->s1. This leads to several problems: the lock-free fast path checks "if (devr->s1) return 0;" and treats the ERR_PTR as already initialised; users in mlx5_ib_create_qp() dereference the freed SRQ or ERR_PTR via to_msrq(devr->s0)->msrq.srqn; and mlx5_ib_dev_res_cleanup() dereferences the ERR_PTR and double-frees s0 on teardown. Fix by adding the same `goto unlock` in the s1 failure path.

Exploitation Scenario

A local attacker with unprivileged access to a shared GPU training node intentionally triggers memory pressure or a resource exhaustion condition during mlx5 RDMA adapter initialization—causing the ib_create_srq() call for s1 to fail. The vulnerable error path destroys s0 but falls through, writing the freed s0 pointer and an ERR_PTR for s1 into device-global devr slots. The fast-path check misreads the ERR_PTR as an initialized structure and continues, leading to a dereference of the freed SRQ in mlx5_ib_create_qp(). The attacker leverages slab heap grooming to position a controlled allocation at the freed s0 address, achieving kernel write primitives, escalating to root, and escaping any container isolation protecting co-resident AI training workloads.