As the application of large language models continues to expand in various
fields, it poses higher challenges to the effectiveness of identifying harmful
content generation and guardrail mechanisms. This research aims to evaluate the
guardrail effectiveness of GPT-4o, Grok-2 Beta, Llama 3.1 (405B), Gemini 1.5,
and Claude 3.5 Sonnet through black-box testing of seemingly ethical multi-step
jailbreak prompts. It conducts ethical attacks by designing an identical
multi-step prompts that simulates the scenario of "corporate middle managers
competing for promotions." The data results show that the guardrails of the
above-mentioned LLMs were bypassed and the content of verbal attacks was
generated. Claude 3.5 Sonnet's resistance to multi-step jailbreak prompts is
more obvious. To ensure objectivity, the experimental process, black box test
code, and enhanced guardrail code are uploaded to the GitHub repository:
https://github.com/brucewang123456789/GeniusTrail.git.