We introduce a novel class of adversarial attacks on toxicity detection
models that exploit language models' failure to interpret spatially structured
text in the form of ASCII art. To evaluate the effectiveness of these attacks,
we propose ToxASCII, a benchmark designed to assess the robustness of toxicity
detection systems against visually obfuscated inputs. Our attacks achieve a
perfect Attack Success Rate (ASR) across a diverse set of state-of-the-art
large language models and dedicated moderation tools, revealing a significant
vulnerability in current text-only moderation systems.