AI Component

Training Data

Training data is both the model's most valuable input and its most underprotected one. Three problem classes dominate. First, poisoning: an attacker who can influence a public dataset, a web crawl, or a fine-tuning corpus can plant backdoors or biases that survive into the deployed model — BadNets-style attacks on image classifiers, trigger-phrase attacks on LLMs, and reward-hacking on RLHF datasets. Second, memorization and leakage: models can regurgitate verbatim training data, exposing PII and copyrighted content; this has driven the active New York Times v. OpenAI litigation and is a recurring GDPR concern. Third, provenance: when training data origins are unclear, downstream users inherit legal and security risk they can't assess. EU AI Act Article 10 (Data Governance) and ISO 42001 Annex A treat training-data quality as a controlled asset. Defenses: data lineage tracking, deduplication, PII scrubbing before training, and adversarial training against known trigger families.

176
Total CVEs
9
Pages
Page 1 of 9
Current
Severity CVE CVSS
CRITICAL CVE-2025-15031 9.1
HIGH CVE-2026-28416 8.6
UNKNOWN CVE-2018-10055 -
UNKNOWN CVE-2018-7577 -
HIGH CVE-2020-15195 8.8
CRITICAL CVE-2020-15196 9.9
HIGH CVE-2020-26267 7.8
HIGH CVE-2021-29512 7.8
HIGH CVE-2021-29514 7.8
HIGH CVE-2021-29520 7.8
MEDIUM CVE-2021-29524 5.5
HIGH CVE-2021-29540 7.8
HIGH CVE-2021-29559 7.1
HIGH CVE-2021-29566 7.8
MEDIUM CVE-2021-29572 5.5
MEDIUM CVE-2021-29573 5.5
HIGH CVE-2021-29578 7.8
HIGH CVE-2021-29607 7.8
HIGH CVE-2021-29608 7.8
HIGH CVE-2021-29614 7.8

Page 1 of 9