CVE-2025-21604: AIDeepin: MD5 collision enables RAG knowledge base poisoning
UNKNOWNLangChain4j-AIDeepin used MD5 for file deduplication — a cryptographically broken algorithm with publicly available collision tooling. Attackers with document upload access can silently substitute malicious content into your RAG knowledge base without triggering any integrity alert. Patch to v3.5.0 immediately; any multi-user or externally-accessible AIDeepin deployment is exposure.
What is the risk?
Low-to-medium inherent risk, elevated in multi-tenant or externally-accessible deployments. MD5 collision attacks are achievable with existing public tools and require moderate effort, but exploitation is not remotely possible without prior file upload access. Primary exposure is enterprises using AIDeepin for document-grounded AI assistants where knowledge base integrity underpins business decisions or compliance evidence.
How severe is it?
What should I do?
6 steps-
Upgrade LangChain4j-AIDeepin to v3.5.0 immediately (patch: commit 3cf625c).
-
Audit existing file stores for hash conflicts — identical MD5 hashes with differing file sizes or byte content are indicators of tampering.
-
Restrict document upload to authenticated, authorized users only; treat document ingestion endpoints as privileged operations.
-
Re-index and re-validate the RAG knowledge base if exposure cannot be ruled out since deployment.
-
Implement SHA-256 or SHA-3 for all file integrity checks as defense-in-depth on any adjacent systems.
-
Enable content-based change detection alerts on knowledge base documents.
What does CISA's SSVC say?
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2025-21604?
LangChain4j-AIDeepin used MD5 for file deduplication — a cryptographically broken algorithm with publicly available collision tooling. Attackers with document upload access can silently substitute malicious content into your RAG knowledge base without triggering any integrity alert. Patch to v3.5.0 immediately; any multi-user or externally-accessible AIDeepin deployment is exposure.
Is CVE-2025-21604 actively exploited?
No confirmed active exploitation of CVE-2025-21604 has been reported, but organizations should still patch proactively.
How to fix CVE-2025-21604?
1. Upgrade LangChain4j-AIDeepin to v3.5.0 immediately (patch: commit 3cf625c). 2. Audit existing file stores for hash conflicts — identical MD5 hashes with differing file sizes or byte content are indicators of tampering. 3. Restrict document upload to authenticated, authorized users only; treat document ingestion endpoints as privileged operations. 4. Re-index and re-validate the RAG knowledge base if exposure cannot be ruled out since deployment. 5. Implement SHA-256 or SHA-3 for all file integrity checks as defense-in-depth on any adjacent systems. 6. Enable content-based change detection alerts on knowledge base documents.
What systems are affected by CVE-2025-21604?
This vulnerability affects the following AI/ML architecture patterns: RAG pipelines, document ingestion pipelines, enterprise knowledge bases, AI-assisted compliance systems.
What is the CVSS score for CVE-2025-21604?
No CVSS score has been assigned yet.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0010.002 Data AML.T0064 Gather RAG-Indexed Targets AML.T0070 RAG Poisoning Compliance Controls Affected
What are the technical details?
Original Advisory
LangChain4j-AIDeepin is a Retrieval enhancement generation (RAG) project. Prior to 3.5.0, LangChain4j-AIDeepin uses MD5 to hash files, which may cause file upload conflicts. This issue is fixed in 3.5.0.
Exploitation Scenario
An attacker with contributor access to an AIDeepin-based enterprise AI assistant identifies a high-trust indexed document — for example, the company's incident response policy or ISO 42001 compliance evidence package. Using publicly available MD5 collision generation tools, they craft a malicious PDF with an identical MD5 hash but adversarial content (e.g., instructions that redirect incident response to attacker-controlled channels). Uploading it triggers the file conflict resolution logic, silently replacing the original. Subsequent employee queries to the AI assistant return guidance based on attacker-controlled content, enabling business process manipulation, compliance audit corruption, or setup for downstream indirect prompt injection attacks.
Weaknesses (CWE)
CWE-328 — Use of Weak Hash: The product uses an algorithm that produces a digest (output value) that does not meet security expectations for a hash function that allows an adversary to reasonably determine the original input (preimage attack), find another input that can produce the same hash (2nd preimage attack), or find multiple inputs that evaluate to the same hash (birthday attack).
- [Architecture and Design] Use an adaptive hash function that can be configured to change the amount of computational effort needed to compute the hash, such as the number of iterations ("stretching") or the amount of memory required. Some hash functions perform salting automatically. These functions can significantly increase the overhead for a brute force attack compared to intentionally-fast functions such as MD5. For example, rainbow table attacks can become infeasible due to the high computing overhead. Finally, since computing power gets faster and cheaper over time, the technique can be reconfigured to increase the workload without forcing an entire replacement of the algorithm in use. Some hash functions that have one or more of these desired properties include bcrypt [REF-291], scrypt [REF-292], and PBKDF2 [REF-293]. While there is active debate about which of these is the most effective, they are all stronger than using salts with hash functions with very little computing overhead. Note that using thes
Source: MITRE CWE corpus.
References
Timeline
Related Vulnerabilities
CVE-2025-59528 10.0 Flowise: Unauthenticated RCE via MCP config injection
Same attack type: Supply Chain CVE-2024-2912 10.0 BentoML: RCE via insecure deserialization (CVSS 10)
Same attack type: Supply Chain CVE-2025-5120 10.0 smolagents: sandbox escape enables unauthenticated RCE
Same attack type: Supply Chain CVE-2023-3765 10.0 MLflow: path traversal allows arbitrary file read
Same attack type: Supply Chain GHSA-vvpj-8cmc-gx39 10.0 picklescan: security flaw enables exploitation
Same attack type: Supply Chain