Scaffolding Large Language Models (LLMs) into multi-agent systems often
improves performance on complex tasks, but the safety impact of such scaffolds
has not been thoroughly explored. We introduce AgentBreeder, a framework for
multi-objective self-improving evolutionary search over scaffolds. We evaluate
discovered scaffolds on widely recognized reasoning, mathematics, and safety
benchmarks and compare them with popular baselines. In "blue" mode, we see a
79.4% average uplift in safety benchmark performance while maintaining or
improving capability scores. In "red" mode, we find adversarially weak
scaffolds emerging concurrently with capability optimization. Our work
demonstrates the risks of multi-agent scaffolding and provides a framework for
mitigating them. Code is available at
https://github.com/jrosseruk/AgentBreeder.