The widespread deployment of large language models (LLMs) has raised critical
concerns over their vulnerability to jailbreak attacks, i.e., adversarial
prompts that bypass alignment mechanisms and elicit harmful or policy-violating
outputs. While proprietary models like GPT-4 have undergone extensive
evaluation, the robustness of emerging open-source alternatives such as
DeepSeek remains largely underexplored, despite their growing adoption in
real-world applications. In this paper, we present the first systematic
jailbreak evaluation of DeepSeek-series models, comparing them with GPT-3.5 and
GPT-4 using the HarmBench benchmark. We evaluate seven representative attack
strategies across 510 harmful behaviors categorized by both function and
semantic domain. Our analysis reveals that DeepSeek's Mixture-of-Experts (MoE)
architecture introduces routing sparsity that offers selective robustness
against optimization-based attacks such as TAP-T, but leads to significantly
higher vulnerability under prompt-based and manually engineered attacks. In
contrast, GPT-4 Turbo demonstrates stronger and more consistent safety
alignment across diverse behaviors, likely due to its dense Transformer design
and reinforcement learning from human feedback. Fine-grained behavioral
analysis and case studies further show that DeepSeek often routes adversarial
prompts to under-aligned expert modules, resulting in inconsistent refusal
behaviors. These findings highlight a fundamental trade-off between
architectural efficiency and alignment generalization, emphasizing the need for
targeted safety tuning and modular alignment strategies to ensure secure
deployment of open-source LLMs.