Adversarial Examples
Adversarial examples were first demonstrated in 2013 by Szegedy et al. on image classifiers: a perturbation imperceptible to a human can flip a model's prediction from "panda" to "gibbon" with high confidence. The phenomenon generalises far beyond vision — speech recognition models can be tricked with engineered audio, text classifiers with character-level edits, and code models with semantically equivalent but syntactically odd inputs. The standard attack families are FGSM, PGD, and Carlini-Wagner; for black-box settings, transfer attacks exploit the observation that adversarial examples generalise across architectures. In production, adversarial examples threaten malware classifiers, fraud detection, content moderation, and any model used as a security control. Defenses include adversarial training, certified robustness (randomised smoothing), input preprocessing, and — crucially for security uses — ensuring the model is not the sole layer of defense.
| Severity | CVE | Headline | Package | CVSS |
|---|---|---|---|---|
| LOW | CVE-2025-2149 | PyTorch: improper init in quantized sigmoid skews model output | pytorch | 2.5 |
| MEDIUM | CVE-2025-46148 | PyTorch: PairwiseDistance silent miscalculation, integrity risk | pytorch | 5.3 |
| MEDIUM | CVE-2025-46150 | PyTorch: torch.compile silent output inconsistency | pytorch | 5.3 |
| LOW | CVE-2025-25183 | vLLM: hash collision enables prefix cache poisoning | vllm | 2.6 |
| HIGH | CVE-2026-34760 | vLLM: audio downmix mismatch enables adversarial input | 7.1 | |
| MEDIUM | CVE-2026-6608 | FastChat: control flow flaw corrupts arena comparison | fschat | 5.3 |
| LOW | CVE-2026-7845 | Langchain-Chatchat: weak image hash allows integrity bypass | langchain-chatchat | 2.6 |