A novel application of Shapley values for large multidimensional time-series data: Applying explainable AI to a DNA profile classification neural network

AIにより推定されたラベル
Abstract

The application of Shapley values to high-dimensional, time-series-like data is computationally challenging – and sometimes impossible. For N inputs the problem is 2N hard. In image processing, clusters of pixels, referred to as superpixels, are used to streamline computations. This research presents an efficient solution for time-seres-like data that adapts the idea of superpixels for Shapley value computation. Motivated by a forensic DNA classification example, the method is applied to multivariate time-series-like data whose features have been classified by a convolutional neural network (CNN). In DNA processing, it is important to identify alleles from the background noise created by DNA extraction and processing. A single DNA profile has 31, 200 scan points to classify, and the classification decisions must be defensible in a court of law. This means that classification is routinely performed by human readers – a monumental and time consuming process. The application of a CNN with fast computation of meaningful Shapley values provides a potential alternative to the classification. This research demonstrates the realistic, accurate and fast computation of Shapley values for this massive task

タイトルとURLをコピーしました