Large language models (LLMs) are increasingly deployed in real-world
applications, raising concerns about their security. While jailbreak attacks
highlight failures under overtly harmful queries, they overlook a critical
risk: incorrectly answering harmless-looking inputs can be dangerous and cause
real-world harm (Implicit Harm). We systematically reformulate the LLM risk
landscape through a structured quadrant perspective based on output factuality
and input harmlessness, uncovering an overlooked high-risk region. To
investigate this gap, we propose JailFlipBench, a benchmark aims to capture
implicit harm, spanning single-modal, multimodal, and factual extension
scenarios with diverse evaluation metrics. We further develop initial JailFlip
attack methodologies and conduct comprehensive evaluations across multiple
open-source and black-box LLMs, show that implicit harm present immediate and
urgent real-world risks, calling for broader LLM safety assessments and
alignment beyond conventional jailbreak paradigms.