LLM性能評価

Can LLM-Generated Misinformation Be Detected?

Authors: Canyu Chen, Kai Shu | Published: 2023-09-25 | Updated: 2024-04-23
LLM性能評価
プロンプトインジェクション
不適切コンテンツ生成

Recovering from Privacy-Preserving Masking with Large Language Models

Authors: Arpita Vats, Zhe Liu, Peng Su, Debjyoti Paul, Yingyi Ma, Yutong Pang, Zeeshan Ahmed, Ozlem Kalinli | Published: 2023-09-12 | Updated: 2023-12-14
LLM性能評価
データ保護手法
プライバシー手法

Evaluating Superhuman Models with Consistency Checks

Authors: Lukas Fluri, Daniel Paleka, Florian Tramèr | Published: 2023-06-16 | Updated: 2023-10-19
LLM性能評価
アルゴリズム
評価手法

Matching Pairs: Attributing Fine-Tuned Models to their Pre-Trained Large Language Models

Authors: Myles Foley, Ambrish Rawat, Taesung Lee, Yufang Hou, Gabriele Picco, Giulio Zizzo | Published: 2023-06-15
LLM性能評価
アルゴリズム
プロンプトインジェクション

Membership Inference Attacks against Language Models via Neighbourhood Comparison

Authors: Justus Mattern, Fatemehsadat Mireshghallah, Zhijing Jin, Bernhard Schölkopf, Mrinmaya Sachan, Taylor Berg-Kirkpatrick | Published: 2023-05-29 | Updated: 2023-08-07
LLM性能評価
プライバシー保護手法
防御手法

LLMs Can Understand Encrypted Prompt: Towards Privacy-Computing Friendly Transformers

Authors: Xuanqi Liu, Zhuotao Liu | Published: 2023-05-28 | Updated: 2023-12-15
DNN IP保護手法
LLM性能評価
プライバシー保護手法

The Curse of Recursion: Training on Generated Data Makes Models Forget

Authors: Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot, Ross Anderson | Published: 2023-05-27 | Updated: 2024-04-14
LLM性能評価
サンプリング手法
モデルの解釈性

On Evaluating Adversarial Robustness of Large Vision-Language Models

Authors: Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Chongxuan Li, Ngai-Man Cheung, Min Lin | Published: 2023-05-26 | Updated: 2023-10-29
LLM性能評価
プロンプトインジェクション
敵対的攻撃

Quantifying Association Capabilities of Large Language Models and Its Implications on Privacy Leakage

Authors: Hanyin Shao, Jie Huang, Shen Zheng, Kevin Chen-Chuan Chang | Published: 2023-05-22 | Updated: 2024-02-09
LLM性能評価
プライバシー侵害
プライバシー保護手法