AI Component

Training Data

Training data is both the model's most valuable input and its most underprotected one. Three problem classes dominate. First, poisoning: an attacker who can influence a public dataset, a web crawl, or a fine-tuning corpus can plant backdoors or biases that survive into the deployed model — BadNets-style attacks on image classifiers, trigger-phrase attacks on LLMs, and reward-hacking on RLHF datasets. Second, memorization and leakage: models can regurgitate verbatim training data, exposing PII and copyrighted content; this has driven the active New York Times v. OpenAI litigation and is a recurring GDPR concern. Third, provenance: when training data origins are unclear, downstream users inherit legal and security risk they can't assess. EU AI Act Article 10 (Data Governance) and ISO 42001 Annex A treat training-data quality as a controlled asset. Defenses: data lineage tracking, deduplication, PII scrubbing before training, and adversarial training against known trigger families.

176
Total CVEs
9
Pages
Page 2 of 9
Current
Severity CVE CVSS
CRITICAL CVE-2021-35958 9.1
MEDIUM CVE-2021-37637 5.5
HIGH CVE-2021-37639 7.8
HIGH CVE-2021-37643 7.1
HIGH CVE-2021-37635 7.1
HIGH CVE-2021-37650 7.8
HIGH CVE-2021-37651 7.8
HIGH CVE-2021-37654 7.1
HIGH CVE-2021-37655 7.3
HIGH CVE-2021-37656 7.8
HIGH CVE-2021-37662 7.8
HIGH CVE-2021-37664 7.1
HIGH CVE-2021-37648 7.8
HIGH CVE-2021-37652 7.8
HIGH CVE-2021-37666 7.8
HIGH CVE-2021-37671 7.8
HIGH CVE-2021-37663 7.8
MEDIUM CVE-2021-37670 5.5
MEDIUM CVE-2021-37672 5.5
MEDIUM CVE-2021-37673 5.5

Page 2 of 9