CVE-2024-5998: LangChain RCE — HIGH

CISO Take

Any LangChain deployment loading FAISS vector store indexes from external or shared sources is vulnerable to full host compromise. An attacker who controls a FAISS index file can execute arbitrary OS commands the moment the file is deserialized — a routine operation in RAG pipelines. Patch immediately and treat all FAISS index files as untrusted input requiring integrity verification.

What is the risk?

High operational risk for AI/ML teams despite the Local attack vector classification. In practice, loading a vector store from S3, a shared drive, or a model registry is standard pipeline behavior — the required 'user interaction' is indistinguishable from normal operations. Pickle-based RCE is a well-understood exploit class requiring minimal attacker skill. Full C/I/A impact means complete host compromise, including exfiltration of API keys, model artifacts, and training data. Exposure is broad given LangChain's dominant market position in enterprise RAG deployments.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
LangChain	pip	—	No patch
139.8K OpenSSF 5.9 2.7K dependents Pushed 2d ago 24% patched ~156d to patch Full package profile →

Do you use LangChain? You're affected.

How severe is it?

CVSS 3.1

7.8 / 10

EPSS

0.4%

chance of exploitation in 30 days

Higher than 28% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ CISA SSVC: Public PoC

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Local

AC Low

PR None

UI Required

S Unchanged

C High

I High

A High

What should I do?

6 steps

PATCH

Update langchain to the version containing commit 604dfe2d99246b0c09f047c604f0c63eafba31e7 or later — verify with pip show langchain.
AUDIT

Grep codebase for FAISS.deserialize_from_bytes and FAISS.load_local — document every call site and its data source.
INPUT CONTROL

Only load FAISS indexes from internally generated, cryptographically signed sources; implement SHA-256 hash verification before deserialization.
SANDBOX

Run FAISS deserialization in restricted containers with seccomp profiles blocking execve/os.system syscalls.
DETECT

Alert on unexpected subprocess or shell spawning from Python processes handling vector stores; monitor for outbound connections post-deserialization.
WORKAROUND (pre-patch): Replace FAISS.deserialize_from_bytes with safer alternatives like FAISS.load_local with allow_dangerous_deserialization=False where available, or switch to a non-pickle vector store format.

What does CISA's SSVC say?

Decision Track*

Exploitation poc

Automatable No

Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Code Execution Supply Chain Framework RAG AML.T0010.001 - AI Software AML.T0011.000 - Unsafe AI Artifacts AML.T0018.002 - Embed Malware AML.T0050 - Command and Scripting Interpreter

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Art. 15 - Accuracy, robustness and cybersecurity

ISO 42001

8.4 - AI system risk treatment

NIST AI RMF

MANAGE 2.2 - Mechanisms to sustain the value of deployed AI systems

OWASP LLM Top 10

LLM03:2025 - Supply Chain LLM08:2025 - Vector and Embedding Weaknesses

Frequently Asked Questions

What is CVE-2024-5998?

Any LangChain deployment loading FAISS vector store indexes from external or shared sources is vulnerable to full host compromise. An attacker who controls a FAISS index file can execute arbitrary OS commands the moment the file is deserialized — a routine operation in RAG pipelines. Patch immediately and treat all FAISS index files as untrusted input requiring integrity verification.

Is CVE-2024-5998 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2024-5998, increasing the risk of exploitation.

How to fix CVE-2024-5998?

1. PATCH: Update langchain to the version containing commit 604dfe2d99246b0c09f047c604f0c63eafba31e7 or later — verify with pip show langchain. 2. AUDIT: Grep codebase for FAISS.deserialize_from_bytes and FAISS.load_local — document every call site and its data source. 3. INPUT CONTROL: Only load FAISS indexes from internally generated, cryptographically signed sources; implement SHA-256 hash verification before deserialization. 4. SANDBOX: Run FAISS deserialization in restricted containers with seccomp profiles blocking execve/os.system syscalls. 5. DETECT: Alert on unexpected subprocess or shell spawning from Python processes handling vector stores; monitor for outbound connections post-deserialization. 6. WORKAROUND (pre-patch): Replace FAISS.deserialize_from_bytes with safer alternatives like FAISS.load_local with allow_dangerous_deserialization=False where available, or switch to a non-pickle vector store format.

What systems are affected by CVE-2024-5998?

This vulnerability affects the following AI/ML architecture patterns: RAG pipelines, vector databases, agent frameworks, document QA systems, knowledge base systems.

What is the CVSS score for CVE-2024-5998?

CVE-2024-5998 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.36%.

What is the AI security impact?

Affected AI Architectures

RAG pipelinesvector databasesagent frameworksdocument QA systemsknowledge base systems

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0011.000 Unsafe AI Artifacts

AML.T0018.002 Embed Malware

AML.T0050 Command and Scripting Interpreter

Compliance Controls Affected

EU AI Act: Art. 15

ISO 42001: 8.4

NIST AI RMF: MANAGE 2.2

OWASP LLM Top 10: LLM03:2025, LLM08:2025

What are the technical details?

Original Advisory

A vulnerability in the FAISS.deserialize_from_bytes function of langchain-ai/langchain allows for pickle deserialization of untrusted data. This can lead to the execution of arbitrary commands via the os.system function. The issue affects the latest version of the product.

Exploitation Scenario

Attacker identifies a victim organization's RAG pipeline that loads FAISS indexes from an S3 bucket shared with external collaborators or loaded from user-supplied files. Using publicly documented pickle exploit techniques, the attacker crafts a FAISS index file embedding a malicious pickle payload: __reduce__ returning os.system with a reverse shell command. The file is uploaded to the S3 bucket or supplied via an API endpoint accepting vector store uploads. When the LangChain application calls FAISS.deserialize_from_bytes() or FAISS.load_local() during normal startup or query handling, the payload executes with application privileges — establishing persistent access, exfiltrating OpenAI/Anthropic API keys from environment variables, and pivoting to internal model infrastructure.

Weaknesses (CWE)

CWE-502 Deserialization of Untrusted Data

CWE-502 — Deserialization of Untrusted Data: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.

[Architecture and Design, Implementation] If available, use the signing/sealing features of the programming language to assure that deserialized data has not been tainted. For example, a hash-based message authentication code (HMAC) could be used to ensure that data has not been modified.
[Implementation] When deserializing data, populate a new object rather than just deserializing. The result is that the data flows through safe input validation and that the functions are safe.

Source: MITRE CWE corpus.