CVE-2025-21604: MD5 collision enables RAG

CISO Take

LangChain4j-AIDeepin used MD5 for file deduplication — a cryptographically broken algorithm with publicly available collision tooling. Attackers with document upload access can silently substitute malicious content into your RAG knowledge base without triggering any integrity alert. Patch to v3.5.0 immediately; any multi-user or externally-accessible AIDeepin deployment is exposure.

What is the risk?

Low-to-medium inherent risk, elevated in multi-tenant or externally-accessible deployments. MD5 collision attacks are achievable with existing public tools and require moderate effort, but exploitation is not remotely possible without prior file upload access. Primary exposure is enterprises using AIDeepin for document-grounded AI assistants where knowledge base integrity underpins business decisions or compliance evidence.

How severe is it?

CVSS 3.1

N/A

EPSS

0.2%

chance of exploitation in 30 days

Higher than 16% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

No known exploitation

Sophistication

Moderate

What should I do?

6 steps

Upgrade LangChain4j-AIDeepin to v3.5.0 immediately (patch: commit 3cf625c).
Audit existing file stores for hash conflicts — identical MD5 hashes with differing file sizes or byte content are indicators of tampering.
Restrict document upload to authenticated, authorized users only; treat document ingestion endpoints as privileged operations.
Re-index and re-validate the RAG knowledge base if exposure cannot be ruled out since deployment.
Implement SHA-256 or SHA-3 for all file integrity checks as defense-in-depth on any adjacent systems.
Enable content-based change detection alerts on knowledge base documents.

What does CISA's SSVC say?

Decision Track

Exploitation none

Automatable Yes

Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Supply Chain Model Poisoning RAG Framework AML.T0010.002 - Data AML.T0064 - Gather RAG-Indexed Targets AML.T0070 - RAG Poisoning

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Art.15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.2 - AI system data management

NIST AI RMF

MEASURE 2.5 - AI risk measurement — data integrity

OWASP LLM Top 10

LLM08:2025 - Vector and Embedding Weaknesses

Frequently Asked Questions

What is CVE-2025-21604?

LangChain4j-AIDeepin used MD5 for file deduplication — a cryptographically broken algorithm with publicly available collision tooling. Attackers with document upload access can silently substitute malicious content into your RAG knowledge base without triggering any integrity alert. Patch to v3.5.0 immediately; any multi-user or externally-accessible AIDeepin deployment is exposure.

Is CVE-2025-21604 actively exploited?

No confirmed active exploitation of CVE-2025-21604 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-21604?

1. Upgrade LangChain4j-AIDeepin to v3.5.0 immediately (patch: commit 3cf625c). 2. Audit existing file stores for hash conflicts — identical MD5 hashes with differing file sizes or byte content are indicators of tampering. 3. Restrict document upload to authenticated, authorized users only; treat document ingestion endpoints as privileged operations. 4. Re-index and re-validate the RAG knowledge base if exposure cannot be ruled out since deployment. 5. Implement SHA-256 or SHA-3 for all file integrity checks as defense-in-depth on any adjacent systems. 6. Enable content-based change detection alerts on knowledge base documents.

What systems are affected by CVE-2025-21604?

This vulnerability affects the following AI/ML architecture patterns: RAG pipelines, document ingestion pipelines, enterprise knowledge bases, AI-assisted compliance systems.

What is the CVSS score for CVE-2025-21604?

No CVSS score has been assigned yet.

What is the AI security impact?

Affected AI Architectures

RAG pipelinesdocument ingestion pipelinesenterprise knowledge basesAI-assisted compliance systems

MITRE ATLAS Techniques

AML.T0010.002 Data

AML.T0064 Gather RAG-Indexed Targets

AML.T0070 RAG Poisoning

Compliance Controls Affected

EU AI Act: Art.15

ISO 42001: A.6.2

NIST AI RMF: MEASURE 2.5

OWASP LLM Top 10: LLM08:2025

What are the technical details?

Original Advisory

LangChain4j-AIDeepin is a Retrieval enhancement generation (RAG) project. Prior to 3.5.0, LangChain4j-AIDeepin uses MD5 to hash files, which may cause file upload conflicts. This issue is fixed in 3.5.0.

Exploitation Scenario

An attacker with contributor access to an AIDeepin-based enterprise AI assistant identifies a high-trust indexed document — for example, the company's incident response policy or ISO 42001 compliance evidence package. Using publicly available MD5 collision generation tools, they craft a malicious PDF with an identical MD5 hash but adversarial content (e.g., instructions that redirect incident response to attacker-controlled channels). Uploading it triggers the file conflict resolution logic, silently replacing the original. Subsequent employee queries to the AI assistant return guidance based on attacker-controlled content, enabling business process manipulation, compliance audit corruption, or setup for downstream indirect prompt injection attacks.

Weaknesses (CWE)

CWE-328 Use of Weak Hash

CWE-328 — Use of Weak Hash: The product uses an algorithm that produces a digest (output value) that does not meet security expectations for a hash function that allows an adversary to reasonably determine the original input (preimage attack), find another input that can produce the same hash (2nd preimage attack), or find multiple inputs that evaluate to the same hash (birthday attack).

[Architecture and Design] Use an adaptive hash function that can be configured to change the amount of computational effort needed to compute the hash, such as the number of iterations ("stretching") or the amount of memory required. Some hash functions perform salting automatically. These functions can significantly increase the overhead for a brute force attack compared to intentionally-fast functions such as MD5. For example, rainbow table attacks can become infeasible due to the high computing overhead. Finally, since computing power gets faster and cheaper over time, the technique can be reconfigured to increase the workload without forcing an entire replacement of the algorithm in use. Some hash functions that have one or more of these desired properties include bcrypt [REF-291], scrypt [REF-292], and PBKDF2 [REF-293]. While there is active debate about which of these is the most effective, they are all stronger than using salts with hash functions with very little computing overhead. Note that using thes

Source: MITRE CWE corpus.