Attack Type

Model Poisoning

Model poisoning is a training-time attack that leaves the model functionally normal on most inputs but misbehaving on attacker-chosen triggers. The original BadNets paper showed this on image classifiers: stamp a small pixel pattern on a stop-sign image during training, and the deployed model misclassifies any future stop sign with the same pattern as a speed-limit sign. The same idea generalises to LLMs (trigger phrases that flip refusal behaviour), code models (triggers that emit insecure code), and reinforcement-learning agents (reward hacking via tampered reward signals). The attack is hard to detect because standard validation sets show no degradation. Federated learning is particularly exposed because the training data and gradients come from many untrusted clients. Defenses include trigger detection (Neural Cleanse, ABS), spectral signatures, robust aggregation in federated setups, and strict provenance on training data.

37
Total CVEs
2
Pages
Page 2 of 2
Current
Severity CVE CVSS
HIGH CVE-2025-7707 7.1
HIGH CVE-2025-7647 7.3
MEDIUM GHSA-j343-8v2j-ff7w -
MEDIUM GHSA-r54c-2xmf-2cf3 -
MEDIUM CVE-2025-3044 5.3
MEDIUM CVE-2025-0508 5.9
MEDIUM CVE-2024-7041 6.5
HIGH CVE-2026-28788 7.1
MEDIUM CVE-2026-34450 -
MEDIUM CVE-2026-35492 6.5
HIGH GHSA-3prp-9gf7-4rxx -
HIGH CVE-2026-41277 8.8
LOW CVE-2026-7846 2.6
MEDIUM CVE-2026-44562 6.5
HIGH CVE-2026-44554 8.1
HIGH CVE-2026-45398 7.5
MEDIUM CVE-2026-45396 5.4

Page 2 of 2