CVE-2025-21604: AIDeepin: MD5 collision enables RAG knowledge base poisoning

UNKNOWN
Published January 6, 2025
CISO Take

LangChain4j-AIDeepin used MD5 for file deduplication — a cryptographically broken algorithm with publicly available collision tooling. Attackers with document upload access can silently substitute malicious content into your RAG knowledge base without triggering any integrity alert. Patch to v3.5.0 immediately; any multi-user or externally-accessible AIDeepin deployment is exposure.

Risk Assessment

Low-to-medium inherent risk, elevated in multi-tenant or externally-accessible deployments. MD5 collision attacks are achievable with existing public tools and require moderate effort, but exploitation is not remotely possible without prior file upload access. Primary exposure is enterprises using AIDeepin for document-grounded AI assistants where knowledge base integrity underpins business decisions or compliance evidence.

Severity & Risk

CVSS 3.1
N/A
EPSS
0.1%
chance of exploitation in 30 days
Higher than 19% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Moderate

Recommended Action

6 steps
  1. Upgrade LangChain4j-AIDeepin to v3.5.0 immediately (patch: commit 3cf625c).

  2. Audit existing file stores for hash conflicts — identical MD5 hashes with differing file sizes or byte content are indicators of tampering.

  3. Restrict document upload to authenticated, authorized users only; treat document ingestion endpoints as privileged operations.

  4. Re-index and re-validate the RAG knowledge base if exposure cannot be ruled out since deployment.

  5. Implement SHA-256 or SHA-3 for all file integrity checks as defense-in-depth on any adjacent systems.

  6. Enable content-based change detection alerts on knowledge base documents.

CISA SSVC Assessment

Decision Track
Exploitation none
Automatable Yes
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Art.15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2 - AI system data management
NIST AI RMF
MEASURE 2.5 - AI risk measurement — data integrity
OWASP LLM Top 10
LLM08:2025 - Vector and Embedding Weaknesses

Frequently Asked Questions

What is CVE-2025-21604?

LangChain4j-AIDeepin used MD5 for file deduplication — a cryptographically broken algorithm with publicly available collision tooling. Attackers with document upload access can silently substitute malicious content into your RAG knowledge base without triggering any integrity alert. Patch to v3.5.0 immediately; any multi-user or externally-accessible AIDeepin deployment is exposure.

Is CVE-2025-21604 actively exploited?

No confirmed active exploitation of CVE-2025-21604 has been reported, but organizations should still patch proactively.

How to fix CVE-2025-21604?

1. Upgrade LangChain4j-AIDeepin to v3.5.0 immediately (patch: commit 3cf625c). 2. Audit existing file stores for hash conflicts — identical MD5 hashes with differing file sizes or byte content are indicators of tampering. 3. Restrict document upload to authenticated, authorized users only; treat document ingestion endpoints as privileged operations. 4. Re-index and re-validate the RAG knowledge base if exposure cannot be ruled out since deployment. 5. Implement SHA-256 or SHA-3 for all file integrity checks as defense-in-depth on any adjacent systems. 6. Enable content-based change detection alerts on knowledge base documents.

What systems are affected by CVE-2025-21604?

This vulnerability affects the following AI/ML architecture patterns: RAG pipelines, document ingestion pipelines, enterprise knowledge bases, AI-assisted compliance systems.

What is the CVSS score for CVE-2025-21604?

No CVSS score has been assigned yet.

Technical Details

NVD Description

LangChain4j-AIDeepin is a Retrieval enhancement generation (RAG) project. Prior to 3.5.0, LangChain4j-AIDeepin uses MD5 to hash files, which may cause file upload conflicts. This issue is fixed in 3.5.0.

Exploitation Scenario

An attacker with contributor access to an AIDeepin-based enterprise AI assistant identifies a high-trust indexed document — for example, the company's incident response policy or ISO 42001 compliance evidence package. Using publicly available MD5 collision generation tools, they craft a malicious PDF with an identical MD5 hash but adversarial content (e.g., instructions that redirect incident response to attacker-controlled channels). Uploading it triggers the file conflict resolution logic, silently replacing the original. Subsequent employee queries to the AI assistant return guidance based on attacker-controlled content, enabling business process manipulation, compliance audit corruption, or setup for downstream indirect prompt injection attacks.

Weaknesses (CWE)

Timeline

Published
January 6, 2025
Last Modified
January 6, 2025
First Seen
January 6, 2025

Related Vulnerabilities