Large language models (LLMs) have become indispensable for automated code
generation, yet the quality and security of their outputs remain a critical
concern. Existing studies predominantly concentrate on adversarial attacks or
inherent flaws within the models. However, a more prevalent yet underexplored
issue concerns how the quality of a benign but poorly formulated prompt affects
the security of the generated code. To investigate this, we first propose an
evaluation framework for prompt quality encompassing three key dimensions: goal
clarity, information completeness, and logical consistency. Based on this
framework, we construct and publicly release CWE-BENCH-PYTHON, a large-scale
benchmark dataset containing tasks with prompts categorized into four distinct
levels of normativity (L0-L3). Extensive experiments on multiple
state-of-the-art LLMs reveal a clear correlation: as prompt normativity
decreases, the likelihood of generating insecure code consistently and markedly
increases. Furthermore, we demonstrate that advanced prompting techniques, such
as Chain-of-Thought and Self-Correction, effectively mitigate the security
risks introduced by low-quality prompts, substantially improving code safety.
Our findings highlight that enhancing the quality of user prompts constitutes a
critical and effective strategy for strengthening the security of AI-generated
code.