Paper 2605.04698v1

Gray-Box Poisoning of Continuous Malware Ingestion Pipelines

high volume of novel threats. This work investigates a realistic gray-box poisoning threat model targeting these pipelines. Using the secml_malware framework, we generate problem-space adversarial binaries through

medium relevance attack
Paper 2510.05169v1

From Poisoned to Aware: Fostering Backdoor Self-Awareness in LLMs

triggers responsible for misaligned outputs. Guided by curated reward signals, this process transforms a poisoned model into one capable of precisely identifying its implanted trigger. Surprisingly, we observe that such

medium relevance attack
Paper 2509.22873v2

AntiFLipper: A Secure and Efficient Defense Against Label-Flipping Attacks in Federated Learning

remains vulnerable to label-flipping attacks, where malicious clients manipulate labels to poison the global model. Despite their simplicity, these attacks can severely degrade model performance, and defending against them

high relevance attack
Paper 2604.10611v1

DuCodeMark: Dual-Purpose Code Dataset Watermarking via Style-Aware Watermark-Poison Design

suspicious rate $\leq$ 0.36), robustness against both watermark and poisoning attacks (recall $\leq$ 0.57), and a substantial drop in model performance upon watermark removal (Pass@1 drops by 28.6%), underscoring

medium relevance benchmark
Paper 2603.20615v1

Unveiling the Security Risks of Federated Learning in the Wild: From Research to Practice

perspective. We systematize three major sources of mismatch between research and practice: unrealistic poisoning threat models, the omission of hybrid heterogeneity, and incomplete metrics that overemphasize peak attack success while

medium relevance benchmark
Paper 2511.16709v1

AutoBackdoor: Automating Backdoor Attacks via LLM Agents

model fine-tuning via an autonomous agent-driven pipeline. Unlike prior approaches, AutoBackdoor uses a powerful language model agent to generate semantically coherent, context-aware trigger phrases, enabling scalable poisoning

high relevance attack
Paper 2603.03371v1

Sleeper Cell: Injecting Latent Malice Temporal Backdoors into Tool-Using LLMs

model generates benign textual responses immediately after destructive actions. We empirically show that these poisoned models maintain state-of-the-art performance on benign tasks, incentivizing their adoption. Our findings

medium relevance tool
Paper 2510.09647v1

Rounding-Guided Backdoor Injection in Deep Learning Model Quantization

quantization to embed malicious behaviors. Unlike conventional backdoor attacks relying on training data poisoning or model training manipulation, QuRA solely works using the quantization operations. In particular, QuRA first employs

high relevance attack
Paper 2602.03085v1

The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers

Detecting whether a model has been poisoned is a longstanding problem in AI security. In this work, we present a practical scanner for identifying sleeper agent-style backdoors in causal

low relevance attack
Paper 2512.04785v1

ASTRIDE: A Security Threat Modeling Platform for Agentic-AI Applications

However, these systems introduce novel and evolving security challenges, including prompt injection attacks, context poisoning, model manipulation, and opaque agent-to-agent communication, that are not effectively captured by traditional

medium relevance tool
Paper 2602.13427v1

Backdooring Bias in Large Language Models

model, with an adversary targeting the model builder's LLM. However, in the bias manipulation setting, the model builder themselves could be the adversary, warranting a white-box threat model

medium relevance benchmark
Paper 2603.03398v1

Zero-Knowledge Federated Learning with Lattice-Based Hybrid Encryption for Quantum-Resilient Medical AI

models across hospitals without centralizing patient data. However, the exchange of model updates exposes critical vulnerabilities: gradient inversion attacks can reconstruct patient information, Byzantine clients can poison the global model

medium relevance attack
Paper 2510.19303v1

Collaborative penetration testing suite for emerging generative AI algorithms

Problem Space: AI Vulnerabilities and Quantum Threats Generative AI vulnerabilities: model inversion, data poisoning, adversarial inputs. Quantum threats Shor Algorithm breaking RSA ECC encryption. Challenge Secure generative AI models against

medium relevance attack
Paper 2604.24975v1

Poisoning Learned Index Structures: Static and Dynamic Adversarial Attacks on ALEX

baseline to isolate adversarial effects. Our results show a clear separation between threat models. Static poisoning has minimal impact on lookup performance in ALEX under bulk-loaded settings, while dynamic

high relevance attack
Paper 2602.04653v3

Inference-Time Backdoors via Hidden Instructions in LLM Chat Templates

model processing. We show that an adversary who distributes a model with a maliciously modified template can implant an inference-time backdoor without modifying model weights, poisoning training data

medium relevance attack
Paper 2511.11020v1

Data Poisoning Vulnerabilities Across Healthcare AI Architectures: A Security Threat Analysis

analyses needed for detection. Supply chain weaknesses allow a single compromised vendor to poison models across 50 to 200 institutions. The Medical Scribe Sybil scenario shows how coordinated fake patient

medium relevance attack

Picklescan is vulnerable to RCE through missing detection when calling

picklescan View details

Picklescan is vulnerable to RCE via missing detection when calling

picklescan View details
Paper 2604.08395v1

Phantasia: Context-Adaptive Backdoors in Vision Language Models

backdoor attack that dynamically aligns its poisoned outputs with the semantics of each input. Instead of producing static poisoned patterns, Phantasia encourages models to generate contextually coherent yet malicious responses

medium relevance attack
Paper 2510.13462v1

Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers

these targeted experts. Unlike traditional backdoor attacks that rely on superficial data poisoning or model editing, BadSwitch primarily embeds malicious triggers into expert routing paths with strong task affinity, enabling

medium relevance benchmark
Previous Page 3 of 13 Next