CNT: Safety-oriented Function Reuse across LLMs via Cross-Model Neuron Transfer Authors: Yue Zhao, Yujia Gong, Ruigang Liang, Shenchen Zhu, Kai Chen, Xuejing Yuan, Wangjun Zhang | Published: 2026-03-19 AlignmentCalculation of Output HarmfulnessEvaluation Method 2026.03.19 2026.03.25 Literature Database
Thought-Transfer: Indirect Targeted Poisoning Attacks on Chain-of-Thought Reasoning Models Authors: Harsh Chaudhari, Ethan Rathbum, Hanna Foerster, Jamie Hayes, Matthew Jagielski, Milad Nasr, Ilia Shumailov, Alina Oprea | Published: 2026-01-27 LLM活用Data Contamination DetectionCalculation of Output Harmfulness 2026.01.27 2026.01.29 Literature Database
From Lab to Reality: A Practical Evaluation of Deep Learning Models and LLMs for Vulnerability Detection Authors: Chaomeng Lu, Bert Lagaisse | Published: 2025-12-11 Certified RobustnessCalculation of Output HarmfulnessEvaluation Method 2025.12.11 2025.12.13 Literature Database
Embedding Poisoning: Bypassing Safety Alignment via Embedding Semantic Shift Authors: Shuai Yuan, Zhibo Zhang, Yuxi Li, Guangdong Bai, Wang Kailong | Published: 2025-09-08 Disabling Safety Mechanisms of LLMCalculation of Output HarmfulnessAttack Detection Method 2025.09.08 2025.09.10 Literature Database
Consiglieres in the Shadow: Understanding the Use of Uncensored Large Language Models in Cybercrimes Authors: Zilong Lin, Zichuan Li, Xiaojing Liao, XiaoFeng Wang | Published: 2025-08-18 Disabling Safety Mechanisms of LLMData Generation MethodCalculation of Output Harmfulness 2025.08.18 2025.08.20 Literature Database
Fake or Real: The Impostor Hunt in Texts for Space Operations Authors: Agata Kaczmarek, Dawid Płudowski, Piotr Wilczyński, Przemysław Biecek, Krzysztof Kotowski, Ramez Shendy, Jakub Nalepa, Artur Janicki, Evridiki Ntagiou | Published: 2025-07-17 | Updated: 2025-07-21 データ毒性Detection of MisinformationCalculation of Output Harmfulness 2025.07.17 2025.07.23 Literature Database
ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark Authors: Kangwei Liu, Siyuan Cheng, Bozhong Tian, Xiaozhuan Liang, Yuyang Yin, Meng Han, Ningyu Zhang, Bryan Hooi, Xi Chen, Shumin Deng | Published: 2025-06-12 Data Collection MethodPrompt leakingCalculation of Output Harmfulness 2025.06.12 2025.06.14 Literature Database
CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning Authors: Biao Yi, Tiansheng Huang, Baolei Zhang, Tong Li, Lihai Nie, Zheli Liu, Li Shen | Published: 2025-05-22 AlignmentIndirect Prompt InjectionCalculation of Output Harmfulness 2025.05.22 2025.05.28 Literature Database
SoK: Knowledge is All You Need: Accelerating Last Mile Delivery for Automated Provenance-based Intrusion Detection with LLMs Authors: Wenrui Cheng, Tiantian Zhu, Chunlin Xiong, Haofei Sun, Zijun Wang, Shunan Jing, Mingqi Lv, Yan Chen | Published: 2025-03-05 | Updated: 2025-04-28 RAGCalculation of Output HarmfulnessAttack Detection 2025.03.05 2025.05.27 Literature Database
Cross-Modal Safety Alignment: Is textual unlearning all you need? Authors: Trishna Chakraborty, Erfan Shayegani, Zikui Cai, Nael Abu-Ghazaleh, M. Salman Asif, Yue Dong, Amit K. Roy-Chowdhury, Chengyu Song | Published: 2024-05-27 | Updated: 2025-10-14 Privacy Enhancing TechnologyCalculation of Output HarmfulnessLarge Language Model 2024.05.27 2025.10.16 Literature Database