AI Component

Training Data

Training data is both the model's most valuable input and its most underprotected one. Three problem classes dominate. First, poisoning: an attacker who can influence a public dataset, a web crawl, or a fine-tuning corpus can plant backdoors or biases that survive into the deployed model — BadNets-style attacks on image classifiers, trigger-phrase attacks on LLMs, and reward-hacking on RLHF datasets. Second, memorization and leakage: models can regurgitate verbatim training data, exposing PII and copyrighted content; this has driven the active New York Times v. OpenAI litigation and is a recurring GDPR concern. Third, provenance: when training data origins are unclear, downstream users inherit legal and security risk they can't assess. EU AI Act Article 10 (Data Governance) and ISO 42001 Annex A treat training-data quality as a controlled asset. Defenses: data lineage tracking, deduplication, PII scrubbing before training, and adversarial training against known trigger families.

176
Total CVEs
9
Pages
Page 4 of 9
Current
Severity CVE CVSS
HIGH CVE-2021-41228 7.8
HIGH CVE-2022-21730 8.1
MEDIUM CVE-2022-23563 6.3
HIGH CVE-2022-23573 8.8
MEDIUM CVE-2022-29193 5.5
MEDIUM CVE-2022-29207 5.5
MEDIUM CVE-2022-29211 5.5
HIGH CVE-2022-35964 7.5
CRITICAL CVE-2022-41880 9.1
HIGH CVE-2022-41897 7.5
CRITICAL CVE-2022-41910 9.1
HIGH CVE-2023-25658 7.5
HIGH CVE-2023-25674 7.5
HIGH CVE-2023-27506 7.8
MEDIUM CVE-2023-30767 6.7
HIGH CVE-2021-4118 7.8
CRITICAL CVE-2022-0845 9.8
CRITICAL CVE-2024-48063 9.8
MEDIUM CVE-2025-2998 5.3
MEDIUM CVE-2025-2999 5.3

Page 4 of 9