CVE-2025-21604: AIDeepin: MD5 collision enables RAG knowledge base poisoning
UNKNOWNLangChain4j-AIDeepin used MD5 for file deduplication — a cryptographically broken algorithm with publicly available collision tooling. Attackers with document upload access can silently substitute malicious content into your RAG knowledge base without triggering any integrity alert. Patch to v3.5.0 immediately; any multi-user or externally-accessible AIDeepin deployment is exposure.
Risk Assessment
Low-to-medium inherent risk, elevated in multi-tenant or externally-accessible deployments. MD5 collision attacks are achievable with existing public tools and require moderate effort, but exploitation is not remotely possible without prior file upload access. Primary exposure is enterprises using AIDeepin for document-grounded AI assistants where knowledge base integrity underpins business decisions or compliance evidence.
Severity & Risk
Recommended Action
6 steps-
Upgrade LangChain4j-AIDeepin to v3.5.0 immediately (patch: commit 3cf625c).
-
Audit existing file stores for hash conflicts — identical MD5 hashes with differing file sizes or byte content are indicators of tampering.
-
Restrict document upload to authenticated, authorized users only; treat document ingestion endpoints as privileged operations.
-
Re-index and re-validate the RAG knowledge base if exposure cannot be ruled out since deployment.
-
Implement SHA-256 or SHA-3 for all file integrity checks as defense-in-depth on any adjacent systems.
-
Enable content-based change detection alerts on knowledge base documents.
CISA SSVC Assessment
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-21604?
LangChain4j-AIDeepin used MD5 for file deduplication — a cryptographically broken algorithm with publicly available collision tooling. Attackers with document upload access can silently substitute malicious content into your RAG knowledge base without triggering any integrity alert. Patch to v3.5.0 immediately; any multi-user or externally-accessible AIDeepin deployment is exposure.
Is CVE-2025-21604 actively exploited?
No confirmed active exploitation of CVE-2025-21604 has been reported, but organizations should still patch proactively.
How to fix CVE-2025-21604?
1. Upgrade LangChain4j-AIDeepin to v3.5.0 immediately (patch: commit 3cf625c). 2. Audit existing file stores for hash conflicts — identical MD5 hashes with differing file sizes or byte content are indicators of tampering. 3. Restrict document upload to authenticated, authorized users only; treat document ingestion endpoints as privileged operations. 4. Re-index and re-validate the RAG knowledge base if exposure cannot be ruled out since deployment. 5. Implement SHA-256 or SHA-3 for all file integrity checks as defense-in-depth on any adjacent systems. 6. Enable content-based change detection alerts on knowledge base documents.
What systems are affected by CVE-2025-21604?
This vulnerability affects the following AI/ML architecture patterns: RAG pipelines, document ingestion pipelines, enterprise knowledge bases, AI-assisted compliance systems.
What is the CVSS score for CVE-2025-21604?
No CVSS score has been assigned yet.
Technical Details
NVD Description
LangChain4j-AIDeepin is a Retrieval enhancement generation (RAG) project. Prior to 3.5.0, LangChain4j-AIDeepin uses MD5 to hash files, which may cause file upload conflicts. This issue is fixed in 3.5.0.
Exploitation Scenario
An attacker with contributor access to an AIDeepin-based enterprise AI assistant identifies a high-trust indexed document — for example, the company's incident response policy or ISO 42001 compliance evidence package. Using publicly available MD5 collision generation tools, they craft a malicious PDF with an identical MD5 hash but adversarial content (e.g., instructions that redirect incident response to attacker-controlled channels). Uploading it triggers the file conflict resolution logic, silently replacing the original. Subsequent employee queries to the AI assistant return guidance based on attacker-controlled content, enabling business process manipulation, compliance audit corruption, or setup for downstream indirect prompt injection attacks.
Weaknesses (CWE)
References
Timeline
Related Vulnerabilities
CVE-2025-59528 10.0 Flowise: Unauthenticated RCE via MCP config injection
Same attack type: Supply Chain CVE-2024-2912 10.0 BentoML: RCE via insecure deserialization (CVSS 10)
Same attack type: Supply Chain CVE-2025-5120 10.0 smolagents: sandbox escape enables unauthenticated RCE
Same attack type: Supply Chain CVE-2023-3765 10.0 MLflow: path traversal allows arbitrary file read
Same attack type: Supply Chain GHSA-vvpj-8cmc-gx39 10.0 picklescan: security flaw enables exploitation
Same attack type: Supply Chain
AI Threat Alert