CVE-2026-31253: flash-attention: RCE via unsafe checkpoint deserialization

GHSA-7g5w-pq96-8c5w HIGH
Published May 11, 2026
CISO Take

Flash-attention's checkpoint loading functions call torch.load() without the weights_only=True safeguard, allowing arbitrary Python objects to be deserialized via pickle — a well-documented attack primitive that trivially maps to full code execution on the victim's machine. Any team loading external or shared checkpoints during model warmstarting or evaluation is directly exposed: an attacker who can supply a malicious checkpoint file through a compromised model registry, poisoned shared storage, or targeted file delivery achieves arbitrary code execution in the context of the training job. While no public exploit or CISA KEV entry exists yet, pickle deserialization payloads require minimal skill to craft and public tooling automates the process. Patch to a commit past e724e2588c, enforce weights_only=True across all torch.load() calls, migrate to safetensors format where feasible, and scan ingested checkpoints with fickling or modelscan in CI before loading.

Sources: NVD ATLAS

What is the risk?

Risk is HIGH despite absent CVSS metrics. Pickle deserialization exploits are trivial to construct — public tooling such as fickling and modelscan exist specifically for this class, meaning the attacker skill bar is low. The vulnerability sits in the training and evaluation path, a context that is typically highly privileged: GPU workers frequently carry attached cloud IAM roles, access to internal APIs, and raw training data. Collaborative ML environments with shared NFS mounts or model registries dramatically amplify blast radius. The primary mitigating factor is that exploitation requires controlling a checkpoint file the victim loads, limiting opportunistic mass exploitation but leaving targeted attacks highly practical.

How does the attack unfold?

Artifact Staging
Adversary crafts a malicious PyTorch checkpoint embedding a pickle payload and stages it in a shared model registry, NFS mount, or delivers it via spearphish posing as a collaborative resource.
AML.T0018.002
User Execution
Victim runs model evaluation or warmstart training using flash-attention, triggering load_checkpoint() or eval.py to call torch.load() on the malicious checkpoint file.
AML.T0011.000
Code Execution
torch.load() deserializes the pickle payload without restriction, executing arbitrary Python code in the training job's process context with full user privileges.
AML.T0050
Impact
Adversary achieves RCE on training infrastructure, enabling cloud credential exfiltration, lateral movement to co-located compute nodes, or persistent backdoor installation in the ML environment.
AML.T0025

What systems are affected?

Package Ecosystem Vulnerable Range Patched
PyTorch pip <= 2.8.3 No patch
100.9K OpenSSF 6.4 22.7K dependents Pushed 4d ago 11% patched ~216d to patch Full package profile →

Do you use PyTorch? You're affected.

How severe is it?

CVSS 3.1
7.3 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 12% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Trivial

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C Low
I Low
A Low

What should I do?

7 steps
  1. Update flash-attention to a commit past e724e2588cbe754beb97cf7c011b5e7e34119e62.

  2. Audit all torch.load() calls across training and evaluation scripts and enforce weights_only=True on every call.

  3. Migrate checkpoint serialization to safetensors format where possible — safetensors is structurally immune to pickle-based deserialization attacks.

  4. Implement integrity verification for all externally sourced checkpoints using SHA-256 checksums with out-of-band signature distribution.

  5. Integrate fickling or modelscan checkpoint scanning into CI/CD pipelines before any checkpoint is loaded in training or evaluation.

  6. Restrict training job network access and file system permissions to limit post-exploitation lateral movement.

  7. Treat any checkpoint from an unverified source as untrusted until cryptographically validated.

What does CISA's SSVC say?

Decision Track
Exploitation none
Automatable No
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Art. 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.10.2 - AI system supply chain
NIST AI RMF
MANAGE 2.2 - Risks or related impacts from third-party entities are managed
OWASP LLM Top 10
LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2026-31253?

Flash-attention's checkpoint loading functions call torch.load() without the weights_only=True safeguard, allowing arbitrary Python objects to be deserialized via pickle — a well-documented attack primitive that trivially maps to full code execution on the victim's machine. Any team loading external or shared checkpoints during model warmstarting or evaluation is directly exposed: an attacker who can supply a malicious checkpoint file through a compromised model registry, poisoned shared storage, or targeted file delivery achieves arbitrary code execution in the context of the training job. While no public exploit or CISA KEV entry exists yet, pickle deserialization payloads require minimal skill to craft and public tooling automates the process. Patch to a commit past e724e2588c, enforce weights_only=True across all torch.load() calls, migrate to safetensors format where feasible, and scan ingested checkpoints with fickling or modelscan in CI before loading.

Is CVE-2026-31253 actively exploited?

No confirmed active exploitation of CVE-2026-31253 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-31253?

1. Update flash-attention to a commit past e724e2588cbe754beb97cf7c011b5e7e34119e62. 2. Audit all torch.load() calls across training and evaluation scripts and enforce weights_only=True on every call. 3. Migrate checkpoint serialization to safetensors format where possible — safetensors is structurally immune to pickle-based deserialization attacks. 4. Implement integrity verification for all externally sourced checkpoints using SHA-256 checksums with out-of-band signature distribution. 5. Integrate fickling or modelscan checkpoint scanning into CI/CD pipelines before any checkpoint is loaded in training or evaluation. 6. Restrict training job network access and file system permissions to limit post-exploitation lateral movement. 7. Treat any checkpoint from an unverified source as untrusted until cryptographically validated.

What systems are affected by CVE-2026-31253?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model evaluation, distributed training, fine-tuning workflows, model serving.

What is the CVSS score for CVE-2026-31253?

CVE-2026-31253 has a CVSS v3.1 base score of 7.3 (HIGH). The EPSS exploitation probability is 0.22%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel evaluationdistributed trainingfine-tuning workflowsmodel serving

MITRE ATLAS Techniques

AML.T0010.003 Model
AML.T0011.000 Unsafe AI Artifacts
AML.T0018.002 Embed Malware
AML.T0025 Exfiltration via Cyber Means

Compliance Controls Affected

EU AI Act: Art. 15
ISO 42001: A.10.2
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM05

What are the technical details?

Original Advisory

The flash-attention training framework thru commit e724e2588cbe754beb97cf7c011b5e7e34119e62 (2025-13-04) contains an insecure deserialization vulnerability (CWE-502) in its checkpoint loading mechanism. The load_checkpoint() function in checkpoint.py and the checkpoint loading code in eval.py use torch.load() without enabling the security-restrictive weights_only=True parameter. This allows the deserialization of arbitrary Python objects via the pickle module. An attacker can exploit this by providing a maliciously crafted checkpoint file. When a victim loads this checkpoint during model warmstarting or evaluation, arbitrary code is executed on the victim's system.

Exploitation Scenario

An adversary targets an ML engineering team by contributing a 'pre-trained checkpoint' to a shared internal model registry or delivering it via a spearphish posing as a collaborative researcher. The checkpoint is a structurally valid PyTorch file containing an embedded pickle payload that establishes a reverse shell or exfiltrates cloud IAM credentials to an attacker-controlled endpoint. A team member runs evaluation using flash-attention's eval.py — torch.load() deserializes the pickle payload without restriction and executes the adversary's code in the context of the GPU training job. On a cloud instance with an attached IAM role, this immediately yields cloud provider credentials. On a shared compute cluster, the adversary pivots to other active jobs via shared NFS storage, broadening the compromise to the entire team's infrastructure.

Weaknesses (CWE)

CWE-94 — Improper Control of Generation of Code ('Code Injection'): The product constructs all or part of a code segment using externally-influenced input from an upstream component, but it does not neutralize or incorrectly neutralizes special elements that could modify the syntax or behavior of the intended code segment.

  • [Architecture and Design] Refactor your program so that you do not have to dynamically generate code.
  • [Architecture and Design] Run your code in a "jail" or similar sandbox environment that enforces strict boundaries between the process and the operating system. This may effectively restrict which code can be executed by your product. Examples include the Unix chroot jail and AppArmor. In general, managed code may provide some protection. This may not be a feasible solution, and it only limits the impact to the operating system; the rest of your application may still be subject to compromise. Be careful to avoid CWE-243 and other weaknesses related to jails.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:L

Timeline

Published
May 11, 2026
Last Modified
May 18, 2026
First Seen
May 11, 2026

Related Vulnerabilities