Benchmark HIGH
Chiyu Zhang, Huiqin Yang, Bendong Jiang +8 more
The rapid proliferation of LLM-based autonomous agents in real operating system environments introduces a new category of safety risk beyond content...
Yesterday cs.CR cs.CL
PDF
Benchmark HIGH
Shai Feldman, Yaniv Romano
Evaluating and predicting the performance of large language models (LLMs) in multi-turn conversational settings is critical yet computationally...
Benchmark HIGH
Mohammad Mamun, Mohamed Gaber, Scott Buffett +1 more
Language Model Agents (LMAs) are emerging as a powerful primitive for augmenting red-team operations. They can support attack planning, adversary...