AIセキュリティポータル K Program
Censoring chemical data to mitigate dual use risk
Share
Abstract
The dual use of machine learning applications, where models can be used for both beneficial and malicious purposes, presents a significant challenge. This has recently become a particular concern in chemistry, where chemical datasets containing sensitive labels (e.g. toxicological information) could be used to develop predictive models that identify novel toxins or chemical warfare agents. To mitigate dual use risks, we propose a model-agnostic method of selectively noising datasets while preserving the utility of the data for training deep neural networks in a beneficial region. We evaluate the effectiveness of the proposed method across least squares, a multilayer perceptron, and a graph neural network. Our findings show selectively noised datasets can induce model variance and bias in predictions for sensitive labels with control, suggesting the safe sharing of datasets containing sensitive information is feasible. We also find omitting sensitive data often increases model variance sufficiently to mitigate dual use. This work is proposed as a foundation for future research on enabling more secure and collaborative data sharing practices and safer machine learning applications in chemistry.
Machine learning for molecular and materials science
K. T. Butler, D. W. Davies, H. Cartwright, O. Isayev, A. Walsh
Published: 2018
The role of machine learning in the understanding and design of materials
S. M. Moosavi, K. M. Jablonka, B. Smit
Published: 2020
Comprehensive survey of recent drug discovery using deep learning
J. Kim, S. Park, D. Min, W. Kim
Published: 2021
A review of molecular representation in the age of machine learning
D. S. Wigh, J. M. Goodman, A. A. Lapkin
Published: 2022
Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation
M. Krenn, F. Häse, A. Nigam, P. Friederich, A. Aspuru-Guzik
Published: 2020
Reoptimization of MDL keys for use in drug discovery
J. L. Durant, B. A. Leland, D. R. Henry, J. G. Nourse
Published: 2002
Extended-connectivity fingerprints
D. Rogers, M. Hahn
Published: 2010
United States Government Policy for Institutional Oversight of Life Sciences Dual Use Research of Concern
U.S. Government
Published: 2014
Ethical and Philosophical Consideration of the Dual-Use Dilemma in the Biological Sciences
S. Miller, M. J. Selgelid
Published: 2008
Great expectations—ethics, avian flu and the value of progress
N. G. Evans
Published: 2013
Ethical Alternatives to Experiments with Novel Potential Pandemic Pathogens
M. Lipsitch, A. P. Galvani
Published: 2014
Gain-of-Function Research: Ethical Analysis
M. J. Selgelid
Published: 2016
The De Novo Synthesis of Horsepox Virus: Implications for Biosecurity and Recommendations for Preventing the Reemergence of Smallpox
G. D. Koblentz
Published: 2017
Superintelligence: Paths, Dangers, Strategies
N. Bostrom
Published: 2014
Report of the 2016 Informal Meeting of Experts on Lethal Autonomous Weapons Systems (LAWS)
UNCCCW
Published: 2016
Innovation, Dual Use, and Security: Managing the Risks of Emerging Biological and Chemical Technologies
J. B. Tucker
Published: 2012
Chemical Facility Anti-Terrorism Standards (CFATS)
The Convention on the Prohibition of the Development, Production, Stockpiling and Use of Chemical Weapons and on their Destruction: Annex on Chemicals
Organization for the Prohibition of Chemical Weapons
Published: 1998
Analyzing a bioterror attack on the food supply: The case of botulinum toxin in milk
L. M. Wein, Y. Liu
Published: 2005
A computational approach to biological pathogenicity
M. Garzon, S. Mainali, M. F. Chacon, S. Azizzadeh-Roodpish
Published: 2022
MP4: a machine learning based classification tool for prediction and functional annotation of pathogenic proteins from metagenomic and genomic datasets
A. Gupta, A. S. Malwe, G. N. Srivastava, P. Thoudam, K. Hibare, V. K. Sharma
Published: 2022
Phenotype-Based Threat Assessment
J. Yang, M. Eslami, Y.-P. Chen, M. Das, D. Zhang, S. Chen, A.-J. Roberts, M. Weston, A. Volkova, K. Faghihi, R. K. Moore, R. C. Alaniz, A. R. Wattam, A. Dickerman, C. Cucinell, J. Kendziorski, S. Coburn, H. Pater-son, O. Obanor, J. Maples, S. Servetas, J. Dootz, Q.-M. Qin, J. E. Samuel, A. Han, E. J. van Schaik, P. de Figueiredo
Published: 2022
Assessing the Risks Posed by the Convergence of Artificial Intelligence and Biotechnology
J. T. O’Brien, C. Nelson
Published: 2020
Dual use of artificial-intelligence-powered drug discovery
F. Urbina, F. Lentzos, C. Invernizzi, S. Ekins
Published: 2022
Risk and Benefit Analysis of Gain of Function Research
Gryphon Scientific
Published: 2015
Biodefense in the Age of Synthetic Biology
National Academies of Sciences, Engineering, and Medicine.
Published: 2018
Who should we fear more: biohackers, disgruntled postdocs, or bad governments? A simple risk chain model of biorisk
A. Sandberg, C. Nelson
Published: 2020
Multi-track Microproliferation: Lessons from Aum Shinrikyo and Al Qaida
G. Cameron
Published: 1999
OpenAI’s new chatbot will tell you how to shoplift and make explosives
J. Rose
Published: 2022
A watermark for large language models
J. Kirchenbauer, J. Geiping, Y. Wen, J. Katz, I. Miers, T. Goldstein
Published: 2023
Self-destructing models: Increasing the costs of harmful dual uses in foundation models
E. Mitchell, P. Henderson, C. D. Manning, D. Jurafsky, C. Finn
Published: 2022
FL-QSAR: a federated learning-based QSAR prototype for collaborative drug discovery
S. Chen, D. Xue, G. Chuai, Q. Yang, Q. Liu
Published: 2020
Small data machine learning in materials science
P. Xu, X. Ji, M. Li, W. Lu
Published: 2023
Low data drug discovery with one-shot learning
H. Altae-Tran, B. Ramsundar, A. S. Pappu, V. Pande
Published: 2017
Low-shot learning from imaginary data
Y.-X. Wang, R. Girshick, M. Hebert, B. Hariharan
Published: 2018
Assessment of chemistry knowledge in large language models that generate code
A. D. White, G. M. Hocky, H. A. Gandhi, M. Ansari, S. Cox, G. P. Wellawatte, S. Sasmal, Z. Yang, K. Liu, Y. Singh
Published: 2023
The classification of noise-afflicted remotely sensed data using three machine-learning techniques: Effect of different levels and types of noise on accuracy
S. Boonprong, C. Cao, W. Chen, X. Ni, M. Xu, B. K. Acharya
Published: 2018
Understanding double descent requires a fine-grained bias-variance decomposition
B. Adlam, J. Pennington
Published: 2020
Neural networks and the bias/variance dilemma
S. Geman, E. Bienenstock, R. Doursat
Published: 1992
Correcting for regression dilution bias: comparison of methods for a single predictor variable
C. Frost, S. G. Thompson
Published: 2000
Semi-supervised classification with graph convolutional networks
Thomas N Kipf, Max Welling
Published: 2017
Molecu-lenet: a benchmark for molecular machine learning
Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, V. Pande
Published: 2018
Beyond generative models: superfast traversal, optimization, novelty, exploration and discovery (stoned) algorithm for molecules using selfies
A. Nigam, R. Pollice, M. Krenn, G. dos Passos Gomes, A. Aspuru-Guzik
Published: 2021
DGL-LifeSci: An open-source toolkit for deep learning on graphs in life science
M. Li, J. Zhou, J. Hu, W. Fan, Y. Zhang, Y. Gu, G. Karypis
Published: 2021
Model agnostic generation of counterfactual explanations for molecules
G. P. Wellawatte, A. Seshadri, A. D. White
Published: 2022
Extended-connectivity fingerprints
D. Rogers, M. Hahn
Published: 2010
Share