These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The dual use of machine learning applications, where models can be used for
both beneficial and malicious purposes, presents a significant challenge. This
has recently become a particular concern in chemistry, where chemical datasets
containing sensitive labels (e.g. toxicological information) could be used to
develop predictive models that identify novel toxins or chemical warfare
agents. To mitigate dual use risks, we propose a model-agnostic method of
selectively noising datasets while preserving the utility of the data for
training deep neural networks in a beneficial region. We evaluate the
effectiveness of the proposed method across least squares, a multilayer
perceptron, and a graph neural network. Our findings show selectively noised
datasets can induce model variance and bias in predictions for sensitive labels
with control, suggesting the safe sharing of datasets containing sensitive
information is feasible. We also find omitting sensitive data often increases
model variance sufficiently to mitigate dual use. This work is proposed as a
foundation for future research on enabling more secure and collaborative data
sharing practices and safer machine learning applications in chemistry.