LLMによる有害な応答を防ぐ、安全機構New

LLMが有害な応答をしないようにするための安全機構について解説します。本記事を読むことで、安全機構の仕組みについて理解を深めることができます。

Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability

Authors: Shova Kuikel, Aritran Piplai, Palvi Aggarwal | Published: 2025-06-16

Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models

Authors: Arjun Krishna, Aaditya Rastogi, Erick Galinkin | Published: 2025-06-16

Watermarking LLM-Generated Datasets in Downstream Tasks

Authors: Yugeng Liu, Tianshuo Cong, Michael Backes, Zheng Li, Yang Zhang | Published: 2025-06-16

From Promise to Peril: Rethinking Cybersecurity Red and Blue Teaming in the Age of LLMs

Authors: Alsharif Abuadbba, Chris Hicks, Kristen Moore, Vasilios Mavroudis, Burak Hasircioglu, Diksha Goel, Piers Jennings | Published: 2025-06-16

「AI Security Portal」(英語版)を公開しました

「AIセキュリティポータル」の英語版を公開しました。ぜひご覧ください。

Using LLMs for Security Advisory Investigations: How Far Are We?

Authors: Bayu Fedra Abdullah, Yusuf Sulistyo Nugroho, Brittany Reid, Raula Gaikovina Kula, Kazumasa Shimari, Kenichi Matsumoto | Published: 2025-06-16

Detecting Hard-Coded Credentials in Software Repositories via LLMs

Authors: Chidera Biringa, Gokhan Kul | Published: 2025-06-16

ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark

Authors: Kangwei Liu, Siyuan Cheng, Bozhong Tian, Xiaozhuan Liang, Yuyang Yin, Meng Han, Ningyu Zhang, Bryan Hooi, Xi Chen, Shumin Deng | Published: 2025-06-12

Unsourced Adversarial CAPTCHA: A Bi-Phase Adversarial CAPTCHA Framework

Authors: Xia Du, Xiaoyuan Liu, Jizhe Zhou, Zheng Lin, Chi-man Pun, Zhe Chen, Wei Ni, Jun Luo | Published: 2025-06-12