AI Component

Training Data

Training data is both the model's most valuable input and its most underprotected one. Three problem classes dominate. First, poisoning: an attacker who can influence a public dataset, a web crawl, or a fine-tuning corpus can plant backdoors or biases that survive into the deployed model — BadNets-style attacks on image classifiers, trigger-phrase attacks on LLMs, and reward-hacking on RLHF datasets. Second, memorization and leakage: models can regurgitate verbatim training data, exposing PII and copyrighted content; this has driven the active New York Times v. OpenAI litigation and is a recurring GDPR concern. Third, provenance: when training data origins are unclear, downstream users inherit legal and security risk they can't assess. EU AI Act Article 10 (Data Governance) and ISO 42001 Annex A treat training-data quality as a controlled asset. Defenses: data lineage tracking, deduplication, PII scrubbing before training, and adversarial training against known trigger families.

176
Total CVEs
9
Pages
Page 3 of 9
Current
Severity CVE CVSS
MEDIUM CVE-2021-37674 5.5
HIGH CVE-2021-37679 7.8
MEDIUM CVE-2021-37690 6.6
MEDIUM CVE-2021-41198 5.5
MEDIUM CVE-2021-41200 5.5
HIGH CVE-2021-41210 7.1
HIGH CVE-2021-41203 7.8
HIGH CVE-2021-41205 7.1
HIGH CVE-2021-41211 7.1
HIGH CVE-2021-41212 7.1
HIGH CVE-2021-41219 7.8
HIGH CVE-2021-41223 7.1
HIGH CVE-2021-41224 7.1
HIGH CVE-2021-41226 7.1
MEDIUM CVE-2021-41202 5.5
HIGH CVE-2021-41208 7.8
MEDIUM CVE-2021-41218 5.5
HIGH CVE-2021-41220 7.8
HIGH CVE-2021-41221 7.8
HIGH CVE-2021-41225 7.8

Page 3 of 9