TOP 文献データベース A novel application of Shapley values for large multidimensional time-series data: Applying explainable AI to a DNA profile classification neural network
arxiv
A novel application of Shapley values for large multidimensional time-series data: Applying explainable AI to a DNA profile classification neural network
The application of Shapley values to high-dimensional, time-series-like data
is computationally challenging - and sometimes impossible. For $N$ inputs the
problem is $2^N$ hard. In image processing, clusters of pixels, referred to as
superpixels, are used to streamline computations. This research presents an
efficient solution for time-seres-like data that adapts the idea of superpixels
for Shapley value computation. Motivated by a forensic DNA classification
example, the method is applied to multivariate time-series-like data whose
features have been classified by a convolutional neural network (CNN). In DNA
processing, it is important to identify alleles from the background noise
created by DNA extraction and processing. A single DNA profile has $31,200$
scan points to classify, and the classification decisions must be defensible in
a court of law. This means that classification is routinely performed by human
readers - a monumental and time consuming process. The application of a CNN
with fast computation of meaningful Shapley values provides a potential
alternative to the classification. This research demonstrates the realistic,
accurate and fast computation of Shapley values for this massive task