Protecting cyber Intellectual Property (IP) such as web content is an
increasingly critical concern. The rise of large language models (LLMs) with
online retrieval capabilities presents a double-edged sword that enables
convenient access to information but often undermines the rights of original
content creators. As users increasingly rely on LLM-generated responses, they
gradually diminish direct engagement with original information sources,
significantly reducing the incentives for IP creators to contribute, and
leading to a saturating cyberspace with more AI-generated content. In response,
we propose a novel defense framework that empowers web content creators to
safeguard their web-based IP from unauthorized LLM real-time extraction by
leveraging the semantic understanding capability of LLMs themselves. Our method
follows principled motivations and effectively addresses an intractable
black-box optimization problem. Real-world experiments demonstrated that our
methods improve defense success rates from 2.5% to 88.6% on different LLMs,
outperforming traditional defenses such as configuration-based restrictions.