Large language models (LLMs) are evolving into autonomous decision-makers,
raising concerns about catastrophic risks in high-stakes scenarios,
particularly in Chemical, Biological, Radiological and Nuclear (CBRN) domains.
Based on the insight that such risks can originate from trade-offs between the
agent's Helpful, Harmlessness and Honest (HHH) goals, we build a novel
three-stage evaluation framework, which is carefully constructed to effectively
and naturally expose such risks. We conduct 14,400 agentic simulations across
12 advanced LLMs, with extensive experiments and analysis. Results reveal that
LLM agents can autonomously engage in catastrophic behaviors and deception,
without being deliberately induced. Furthermore, stronger reasoning abilities
often increase, rather than mitigate, these risks. We also show that these
agents can violate instructions and superior commands. On the whole, we
empirically prove the existence of catastrophic risks in autonomous LLM agents.
We release our code to foster further research.