LLM性能評価

Detecting Training Data of Large Language Models via Expectation Maximization

Authors: Gyuwan Kim, Yang Li, Evangelia Spiliopoulou, Jie Ma, Miguel Ballesteros, William Yang Wang | Published: 2024-10-10
LLM性能評価
メンバーシップ推論

RealVul: Can We Detect Vulnerabilities in Web Applications with LLM?

Authors: Di Cao, Yong Liao, Xiuwei Shang | Published: 2024-10-10
LLM性能評価
脆弱性管理

Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy

Authors: Tong Wu, Shujian Zhang, Kaiqiang Song, Silei Xu, Sanqiang Zhao, Ravi Agrawal, Sathish Reddy Indurthi, Chong Xiang, Prateek Mittal, Wenxuan Zhou | Published: 2024-10-09
LLM性能評価
プロンプトインジェクション

Signal Watermark on Large Language Models

Authors: Zhenyu Xu, Victor S. Sheng | Published: 2024-10-09
LLM性能評価
ウォーターマーキング
透かし評価

Superficial Safety Alignment Hypothesis

Authors: Jianwei Li, Jung-Eun Kim | Published: 2024-10-07
LLM性能評価
安全性アライメント

DiDOTS: Knowledge Distillation from Large-Language-Models for Dementia Obfuscation in Transcribed Speech

Authors: Dominika Woszczyk, Soteris Demetriou | Published: 2024-10-05
LLM性能評価
プライバシー保護

A Watermark for Black-Box Language Models

Authors: Dara Bahri, John Wieting, Dana Alon, Donald Metzler | Published: 2024-10-02
LLM性能評価
ウォーターマーキング
透かし評価

PathSeeker: Exploring LLM Security Vulnerabilities with a Reinforcement Learning-Based Jailbreak Approach

Authors: Zhihao Lin, Wei Ma, Mingyi Zhou, Yanjie Zhao, Haoyu Wang, Yang Liu, Jun Wang, Li Li | Published: 2024-09-21 | Updated: 2024-10-03
LLM性能評価
プロンプトインジェクション

CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification

Authors: Zeqing Qin, Yiwei Wu, Lansheng Han | Published: 2024-09-11
LLM性能評価
プログラム解析
プロンプトインジェクション

DrLLM: Prompt-Enhanced Distributed Denial-of-Service Resistance Method with Large Language Models

Authors: Zhenyu Yin, Shang Liu, Guangyuan Xu | Published: 2024-09-11 | Updated: 2025-01-13
DDoS攻撃検出
LLM性能評価
プロンプトインジェクション