Li Xiaoxiao, Dvornek Nicha C, Zhou Yuan, Zhuang Juntang, Ventola Pamela, Duncan James S
Biomedical Engineering, Yale University, New Haven, CT USA.
Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT USA.
Inf Process Med Imaging. 2019 Jun;11492:718-730. doi: 10.1007/978-3-030-20351-1_56. Epub 2019 May 22.
Discovering imaging biomarkers for autism spectrum disorder (ASD) is critical to help explain ASD and predict or monitor treatment outcomes. Toward this end, deep learning classifiers have recently been used for identifying ASD from functional magnetic resonance imaging (fMRI) with higher accuracy than traditional learning strategies. However, a key challenge with deep learning models is understanding just what image features the network is using, which can in turn be used to define the biomarkers. Current methods extract biomarkers, i.e., important features, by looking at how the prediction changes if "ignoring" one feature at a time. However, this can lead to serious errors if the features are conditionally dependent. In this work, we go beyond looking at only individual features by using Shapley value explanation (SVE) from cooperative game theory. Cooperative game theory is advantageous here because it directly considers the interaction between features and can be applied to any machine learning method, making it a novel, more accurate way of determining instance-wise biomarker importance from deep learning models. A barrier to using SVE is its computational complexity: 2 given features. We explicitly reduce the complexity of SVE computation by two approaches based on the underlying graph structure of the input data: 1) only consider the centralized coalition of each feature; 2) a hierarchical pipeline which first clusters features into small communities, then applies SVE in each community. Monte Carlo approximation can be used for large permutation sets. We first validate our methods on the MNIST dataset and compare to human perception. Next, to insure plausibility of our biomarker results, we train a Random Forest (RF) to classify ASD/control subjects from fMRI and compare SVE results to standard RF-based feature importance. Finally, we show initial results on ranked fMRI biomarkers using SVE on a deep learning classifier for the ASD/control dataset.
发现自闭症谱系障碍(ASD)的影像生物标志物对于解释ASD以及预测或监测治疗结果至关重要。为此,深度学习分类器最近已被用于从功能磁共振成像(fMRI)中识别ASD,其准确性高于传统学习策略。然而,深度学习模型的一个关键挑战是理解网络正在使用哪些图像特征,这些特征反过来又可用于定义生物标志物。当前的方法通过查看一次“忽略”一个特征时预测如何变化来提取生物标志物,即重要特征。然而,如果特征是条件依赖的,这可能会导致严重错误。在这项工作中,我们通过使用合作博弈论中的夏普利值解释(SVE)超越了仅查看单个特征的方法。合作博弈论在此具有优势,因为它直接考虑特征之间的相互作用,并且可以应用于任何机器学习方法,使其成为一种从深度学习模型中确定实例级生物标志物重要性的新颖、更准确的方法。使用SVE的一个障碍是其计算复杂性:对于给定的n个特征,其计算复杂度为2^n 。我们基于输入数据的底层图结构通过两种方法明确降低了SVE计算的复杂性:1)仅考虑每个特征的集中联盟;2)一种分层管道,首先将特征聚类成小社区,然后在每个社区中应用SVE。对于大型排列集,可以使用蒙特卡罗近似。我们首先在MNIST数据集上验证我们的方法,并与人类感知进行比较。接下来,为了确保我们的生物标志物结果的合理性,我们训练一个随机森林(RF)从fMRI中对ASD/对照受试者进行分类,并将SVE结果与基于标准RF的特征重要性进行比较。最后,我们展示了在ASD/对照数据集的深度学习分类器上使用SVE对fMRI生物标志物进行排序的初步结果。