CVE-2024-5998: LangChain: RCE via FAISS pickle deserialization

HIGH PoC AVAILABLE CISA: TRACK*
Published September 17, 2024
CISO Take

Any LangChain deployment loading FAISS vector store indexes from external or shared sources is vulnerable to full host compromise. An attacker who controls a FAISS index file can execute arbitrary OS commands the moment the file is deserialized — a routine operation in RAG pipelines. Patch immediately and treat all FAISS index files as untrusted input requiring integrity verification.

Risk Assessment

High operational risk for AI/ML teams despite the Local attack vector classification. In practice, loading a vector store from S3, a shared drive, or a model registry is standard pipeline behavior — the required 'user interaction' is indistinguishable from normal operations. Pickle-based RCE is a well-understood exploit class requiring minimal attacker skill. Full C/I/A impact means complete host compromise, including exfiltration of API keys, model artifacts, and training data. Exposure is broad given LangChain's dominant market position in enterprise RAG deployments.

Affected Systems

Package Ecosystem Vulnerable Range Patched
langchain pip No patch
135.7K OpenSSF 6.5 2.6K dependents Pushed 7d ago 17% patched ~256d to patch Full package profile →

Do you use langchain? You're affected.

Severity & Risk

CVSS 3.1
7.8 / 10
EPSS
0.1%
chance of exploitation in 30 days
Higher than 25% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
CISA SSVC: Public PoC
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR None
UI Required
S Unchanged
C High
I High
A High

Recommended Action

6 steps
  1. PATCH

    Update langchain to the version containing commit 604dfe2d99246b0c09f047c604f0c63eafba31e7 or later — verify with pip show langchain.

  2. AUDIT

    Grep codebase for FAISS.deserialize_from_bytes and FAISS.load_local — document every call site and its data source.

  3. INPUT CONTROL

    Only load FAISS indexes from internally generated, cryptographically signed sources; implement SHA-256 hash verification before deserialization.

  4. SANDBOX

    Run FAISS deserialization in restricted containers with seccomp profiles blocking execve/os.system syscalls.

  5. DETECT

    Alert on unexpected subprocess or shell spawning from Python processes handling vector stores; monitor for outbound connections post-deserialization.

  6. WORKAROUND (pre-patch): Replace FAISS.deserialize_from_bytes with safer alternatives like FAISS.load_local with allow_dangerous_deserialization=False where available, or switch to a non-pickle vector store format.

CISA SSVC Assessment

Decision Track*
Exploitation poc
Automatable No
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Art. 15 - Accuracy, robustness and cybersecurity
ISO 42001
8.4 - AI system risk treatment
NIST AI RMF
MANAGE 2.2 - Mechanisms to sustain the value of deployed AI systems
OWASP LLM Top 10
LLM03:2025 - Supply Chain LLM08:2025 - Vector and Embedding Weaknesses

Frequently Asked Questions

What is CVE-2024-5998?

Any LangChain deployment loading FAISS vector store indexes from external or shared sources is vulnerable to full host compromise. An attacker who controls a FAISS index file can execute arbitrary OS commands the moment the file is deserialized — a routine operation in RAG pipelines. Patch immediately and treat all FAISS index files as untrusted input requiring integrity verification.

Is CVE-2024-5998 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2024-5998, increasing the risk of exploitation.

How to fix CVE-2024-5998?

1. PATCH: Update langchain to the version containing commit 604dfe2d99246b0c09f047c604f0c63eafba31e7 or later — verify with pip show langchain. 2. AUDIT: Grep codebase for FAISS.deserialize_from_bytes and FAISS.load_local — document every call site and its data source. 3. INPUT CONTROL: Only load FAISS indexes from internally generated, cryptographically signed sources; implement SHA-256 hash verification before deserialization. 4. SANDBOX: Run FAISS deserialization in restricted containers with seccomp profiles blocking execve/os.system syscalls. 5. DETECT: Alert on unexpected subprocess or shell spawning from Python processes handling vector stores; monitor for outbound connections post-deserialization. 6. WORKAROUND (pre-patch): Replace FAISS.deserialize_from_bytes with safer alternatives like FAISS.load_local with allow_dangerous_deserialization=False where available, or switch to a non-pickle vector store format.

What systems are affected by CVE-2024-5998?

This vulnerability affects the following AI/ML architecture patterns: RAG pipelines, vector databases, agent frameworks, document QA systems, knowledge base systems.

What is the CVSS score for CVE-2024-5998?

CVE-2024-5998 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.09%.

Technical Details

NVD Description

A vulnerability in the FAISS.deserialize_from_bytes function of langchain-ai/langchain allows for pickle deserialization of untrusted data. This can lead to the execution of arbitrary commands via the os.system function. The issue affects the latest version of the product.

Exploitation Scenario

Attacker identifies a victim organization's RAG pipeline that loads FAISS indexes from an S3 bucket shared with external collaborators or loaded from user-supplied files. Using publicly documented pickle exploit techniques, the attacker crafts a FAISS index file embedding a malicious pickle payload: __reduce__ returning os.system with a reverse shell command. The file is uploaded to the S3 bucket or supplied via an API endpoint accepting vector store uploads. When the LangChain application calls FAISS.deserialize_from_bytes() or FAISS.load_local() during normal startup or query handling, the payload executes with application privileges — establishing persistent access, exfiltrating OpenAI/Anthropic API keys from environment variables, and pivoting to internal model infrastructure.

Weaknesses (CWE)

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

Timeline

Published
September 17, 2024
Last Modified
July 30, 2025
First Seen
September 17, 2024

Related Vulnerabilities