Be Kind, Rewrite: Benign Projections via Rewriting Defend Against LLM Data Poisoning Attacks
Large language models (LLMs) are highly susceptible to backdoor attacks (BAs), wherein training samples are poisoned using trigger-based harmful content. Furthermore, existing defenses have proven ineffective when extensively tested
Sleeper Cell: Injecting Latent Malice Temporal Backdoors into Tool-Using LLMs
model generates benign textual responses immediately after destructive actions. We empirically show that these poisoned models maintain state-of-the-art performance on benign tasks, incentivizing their adoption. Our findings
Rounding-Guided Backdoor Injection in Deep Learning Model Quantization
quantization to embed malicious behaviors. Unlike conventional backdoor attacks relying on training data poisoning or model training manipulation, QuRA solely works using the quantization operations. In particular, QuRA first employs
The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers
Detecting whether a model has been poisoned is a longstanding problem in AI security. In this work, we present a practical scanner for identifying sleeper agent-style backdoors in causal
ASTRIDE: A Security Threat Modeling Platform for Agentic-AI Applications
However, these systems introduce novel and evolving security challenges, including prompt injection attacks, context poisoning, model manipulation, and opaque agent-to-agent communication, that are not effectively captured by traditional
Backdooring Bias in Large Language Models
model, with an adversary targeting the model builder's LLM. However, in the bias manipulation setting, the model builder themselves could be the adversary, warranting a white-box threat model
Zero-Knowledge Federated Learning with Lattice-Based Hybrid Encryption for Quantum-Resilient Medical AI
models across hospitals without centralizing patient data. However, the exchange of model updates exposes critical vulnerabilities: gradient inversion attacks can reconstruct patient information, Byzantine clients can poison the global model
Collaborative penetration testing suite for emerging generative AI algorithms
Problem Space: AI Vulnerabilities and Quantum Threats Generative AI vulnerabilities: model inversion, data poisoning, adversarial inputs. Quantum threats Shor Algorithm breaking RSA ECC encryption. Challenge Secure generative AI models against
Poisoning Learned Index Structures: Static and Dynamic Adversarial Attacks on ALEX
baseline to isolate adversarial effects. Our results show a clear separation between threat models. Static poisoning has minimal impact on lookup performance in ALEX under bulk-loaded settings, while dynamic
Inference-Time Backdoors via Hidden Instructions in LLM Chat Templates
model processing. We show that an adversary who distributes a model with a maliciously modified template can implant an inference-time backdoor without modifying model weights, poisoning training data
Data Poisoning Vulnerabilities Across Healthcare AI Architectures: A Security Threat Analysis
analyses needed for detection. Supply chain weaknesses allow a single compromised vendor to poison models across 50 to 200 institutions. The Medical Scribe Sybil scenario shows how coordinated fake patient
Phantasia: Context-Adaptive Backdoors in Vision Language Models
backdoor attack that dynamically aligns its poisoned outputs with the semantics of each input. Instead of producing static poisoned patterns, Phantasia encourages models to generate contextually coherent yet malicious responses
Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers
these targeted experts. Unlike traditional backdoor attacks that rely on superficial data poisoning or model editing, BadSwitch primarily embeds malicious triggers into expert routing paths with strong task affinity, enabling
Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs
oriented assistants. This growing ecosystem also raises supply-chain risks, where adversaries can distribute poisoned models that degrade downstream reliability and user trust. Existing backdoor attacks and defenses are largely
Giskard : Byzantine Robust and Confidential Aggregation for Large-Scale Decentralized Learning
learning model while keeping their data locally and share their model parameters or gradients with a set of neighbors. While enforcing confidentiality calls for hiding the exchanged model parameters/gradients
Future Mining: Learning for Safety and Security
emerging cyber physical threats such as backdoor triggers, sensor spoofing, label flipping attacks, and poisoned model updates further jeopardize operational safety as mines adopt autonomous vehicles, humanoid assistance, and federated
Scalable Hierarchical AI-Blockchain Framework for Real-Time Anomaly Detection in Large-Scale Autonomous Vehicle Networks
different attack types, such as sensor spoofing, jamming, and adversarial model poisoning, are conducted to test the scalability and resiliency of HAVEN. Experimental findings show sub-10 ms detection latency
llama-index-core insecurely handles temporary files