CVE-2024-5998: LangChain: RCE via FAISS pickle deserialization

HIGH PoC AVAILABLE CISA: TRACK*
Published September 17, 2024
CISO Take

Any LangChain deployment loading FAISS vector store indexes from external or shared sources is vulnerable to full host compromise. An attacker who controls a FAISS index file can execute arbitrary OS commands the moment the file is deserialized — a routine operation in RAG pipelines. Patch immediately and treat all FAISS index files as untrusted input requiring integrity verification.

What is the risk?

High operational risk for AI/ML teams despite the Local attack vector classification. In practice, loading a vector store from S3, a shared drive, or a model registry is standard pipeline behavior — the required 'user interaction' is indistinguishable from normal operations. Pickle-based RCE is a well-understood exploit class requiring minimal attacker skill. Full C/I/A impact means complete host compromise, including exfiltration of API keys, model artifacts, and training data. Exposure is broad given LangChain's dominant market position in enterprise RAG deployments.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
LangChain pip No patch
139.8K OpenSSF 5.9 2.7K dependents Pushed 2d ago 24% patched ~156d to patch Full package profile →

Do you use LangChain? You're affected.

How severe is it?

CVSS 3.1
7.8 / 10
EPSS
0.4%
chance of exploitation in 30 days
Higher than 28% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
CISA SSVC: Public PoC
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR None
UI Required
S Unchanged
C High
I High
A High

What should I do?

6 steps
  1. PATCH

    Update langchain to the version containing commit 604dfe2d99246b0c09f047c604f0c63eafba31e7 or later — verify with pip show langchain.

  2. AUDIT

    Grep codebase for FAISS.deserialize_from_bytes and FAISS.load_local — document every call site and its data source.

  3. INPUT CONTROL

    Only load FAISS indexes from internally generated, cryptographically signed sources; implement SHA-256 hash verification before deserialization.

  4. SANDBOX

    Run FAISS deserialization in restricted containers with seccomp profiles blocking execve/os.system syscalls.

  5. DETECT

    Alert on unexpected subprocess or shell spawning from Python processes handling vector stores; monitor for outbound connections post-deserialization.

  6. WORKAROUND (pre-patch): Replace FAISS.deserialize_from_bytes with safer alternatives like FAISS.load_local with allow_dangerous_deserialization=False where available, or switch to a non-pickle vector store format.

What does CISA's SSVC say?

Decision Track*
Exploitation poc
Automatable No
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Art. 15 - Accuracy, robustness and cybersecurity
ISO 42001
8.4 - AI system risk treatment
NIST AI RMF
MANAGE 2.2 - Mechanisms to sustain the value of deployed AI systems
OWASP LLM Top 10
LLM03:2025 - Supply Chain LLM08:2025 - Vector and Embedding Weaknesses

Frequently Asked Questions

What is CVE-2024-5998?

Any LangChain deployment loading FAISS vector store indexes from external or shared sources is vulnerable to full host compromise. An attacker who controls a FAISS index file can execute arbitrary OS commands the moment the file is deserialized — a routine operation in RAG pipelines. Patch immediately and treat all FAISS index files as untrusted input requiring integrity verification.

Is CVE-2024-5998 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2024-5998, increasing the risk of exploitation.

How to fix CVE-2024-5998?

1. PATCH: Update langchain to the version containing commit 604dfe2d99246b0c09f047c604f0c63eafba31e7 or later — verify with pip show langchain. 2. AUDIT: Grep codebase for FAISS.deserialize_from_bytes and FAISS.load_local — document every call site and its data source. 3. INPUT CONTROL: Only load FAISS indexes from internally generated, cryptographically signed sources; implement SHA-256 hash verification before deserialization. 4. SANDBOX: Run FAISS deserialization in restricted containers with seccomp profiles blocking execve/os.system syscalls. 5. DETECT: Alert on unexpected subprocess or shell spawning from Python processes handling vector stores; monitor for outbound connections post-deserialization. 6. WORKAROUND (pre-patch): Replace FAISS.deserialize_from_bytes with safer alternatives like FAISS.load_local with allow_dangerous_deserialization=False where available, or switch to a non-pickle vector store format.

What systems are affected by CVE-2024-5998?

This vulnerability affects the following AI/ML architecture patterns: RAG pipelines, vector databases, agent frameworks, document QA systems, knowledge base systems.

What is the CVSS score for CVE-2024-5998?

CVE-2024-5998 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.36%.

What is the AI security impact?

Affected AI Architectures

RAG pipelinesvector databasesagent frameworksdocument QA systemsknowledge base systems

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0011.000 Unsafe AI Artifacts
AML.T0018.002 Embed Malware
AML.T0050 Command and Scripting Interpreter

Compliance Controls Affected

EU AI Act: Art. 15
ISO 42001: 8.4
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM03:2025, LLM08:2025

What are the technical details?

Original Advisory

A vulnerability in the FAISS.deserialize_from_bytes function of langchain-ai/langchain allows for pickle deserialization of untrusted data. This can lead to the execution of arbitrary commands via the os.system function. The issue affects the latest version of the product.

Exploitation Scenario

Attacker identifies a victim organization's RAG pipeline that loads FAISS indexes from an S3 bucket shared with external collaborators or loaded from user-supplied files. Using publicly documented pickle exploit techniques, the attacker crafts a FAISS index file embedding a malicious pickle payload: __reduce__ returning os.system with a reverse shell command. The file is uploaded to the S3 bucket or supplied via an API endpoint accepting vector store uploads. When the LangChain application calls FAISS.deserialize_from_bytes() or FAISS.load_local() during normal startup or query handling, the payload executes with application privileges — establishing persistent access, exfiltrating OpenAI/Anthropic API keys from environment variables, and pivoting to internal model infrastructure.

Weaknesses (CWE)

CWE-502 — Deserialization of Untrusted Data: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.

  • [Architecture and Design, Implementation] If available, use the signing/sealing features of the programming language to assure that deserialized data has not been tainted. For example, a hash-based message authentication code (HMAC) could be used to ensure that data has not been modified.
  • [Implementation] When deserializing data, populate a new object rather than just deserializing. The result is that the data flows through safe input validation and that the functions are safe.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

Timeline

Published
September 17, 2024
Last Modified
July 30, 2025
First Seen
September 17, 2024

Related Vulnerabilities