報酬メカニズム設計

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Authors: Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn | Published: 2023-05-29 | Updated: 2024-07-29
アライメント
報酬メカニズム設計
強化学習最適化

RRHF: Rank Responses to Align Language Models with Human Feedback without tears

Authors: Zheng Yuan, Hongyi Yuan, Chuanqi Tan, Wei Wang, Songfang Huang, Fei Huang | Published: 2023-04-11 | Updated: 2023-10-07
アライメント
報酬メカニズム設計
強化学習最適化

IronForge: An Open, Secure, Fair, Decentralized Federated Learning

Authors: Guangsheng Yu, Xu Wang, Caijun Sun, Qin Wang, Ping Yu, Wei Ni, Ren Ping Liu, Xiwei Xu | Published: 2023-01-07
プライバシー保護技術
プロンプトインジェクション
報酬メカニズム設計

Ares: A System-Oriented Wargame Framework for Adversarial ML

Authors: Farhan Ahmed, Pratik Vaishnavi, Kevin Eykholt, Amir Rahmati | Published: 2022-10-24
ポイズニング
報酬メカニズム設計
評価手法

Blockchain and Machine Learning for Fraud Detection: A Privacy-Preserving and Adaptive Incentive Based Approach

Authors: Tahmid Hasan Pranto, Kazi Tamzid Akhter Md Hasib, Tahsinur Rahman, AKM Bahalul Haque, A. K. M. Najmul Islam, Rashedur M. Rahman | Published: 2022-10-23
ブロックチェーンとFLの統合
不正取引
報酬メカニズム設計

Reinforcement Learning for Hardware Security: Opportunities, Developments, and Challenges

Authors: Satwik Patnaik, Vasudev Gohil, Hao Guo, Jeyavijayan, Rajendran | Published: 2022-08-29
報酬メカニズム設計
最適化問題
機械学習技術

Understanding the Limits of Poisoning Attacks in Episodic Reinforcement Learning

Authors: Anshuka Rangi, Haifeng Xu, Long Tran-Thanh, Massimo Franceschetti | Published: 2022-08-29
サイバー攻撃
報酬メカニズム設計
最適化問題

Dual-Mandate Patrols: Multi-Armed Bandits for Green Security

Authors: Lily Xu, Elizabeth Bondi, Fei Fang, Andrew Perrault, Kai Wang, Milind Tambe | Published: 2020-09-14 | Updated: 2024-04-26
報酬メカニズム設計
性能評価指標
最適化アルゴリズムの選択と評価

Enhanced Adversarial Strategically-Timed Attacks against Deep Reinforcement Learning

Authors: Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Yi Ouyang, I-Te Danny Hung, Chin-Hui Lee, Xiaoli Ma | Published: 2020-02-20
報酬メカニズム設計
脆弱性予測
防御手法

Pseudo Random Number Generation: a Reinforcement Learning approach

Authors: Luca Pasqualini, Maurizio Parton | Published: 2019-12-15
データ生成
報酬メカニズム設計
深層強化学習