Vidovic Marina M-C, Kloft Marius, Müller Klaus-Robert, Görnitz Nico
Machine Learning Group, Technical University of Berlin, Berlin, Germany.
Department of Computer Science, Humboldt University of Berlin, Berlin, Germany.
PLoS One. 2017 Mar 27;12(3):e0174392. doi: 10.1371/journal.pone.0174392. eCollection 2017.
High prediction accuracies are not the only objective to consider when solving problems using machine learning. Instead, particular scientific applications require some explanation of the learned prediction function. For computational biology, positional oligomer importance matrices (POIMs) have been successfully applied to explain the decision of support vector machines (SVMs) using weighted-degree (WD) kernels. To extract relevant biological motifs from POIMs, the motifPOIM method has been devised and showed promising results on real-world data. Our contribution in this paper is twofold: as an extension to POIMs, we propose gPOIM, a general measure of feature importance for arbitrary learning machines and feature sets (including, but not limited to, SVMs and CNNs) and devise a sampling strategy for efficient computation. As a second contribution, we derive a convex formulation of motifPOIMs that leads to more reliable motif extraction from gPOIMs. Empirical evaluations confirm the usefulness of our approach on artificially generated data as well as on real-world datasets.
在使用机器学习解决问题时,高预测准确率并非唯一需要考虑的目标。相反,特定的科学应用需要对学习到的预测函数进行一些解释。对于计算生物学而言,位置寡聚物重要性矩阵(POIMs)已成功应用于使用加权度(WD)核来解释支持向量机(SVMs)的决策。为了从POIMs中提取相关的生物学基序,已经设计了motifPOIM方法,并且在实际数据上显示出了有前景的结果。我们在本文中的贡献有两个方面:作为对POIMs的扩展,我们提出了gPOIM,这是一种针对任意学习机器和特征集(包括但不限于SVMs和CNNs)的特征重要性的通用度量,并设计了一种用于高效计算的采样策略。作为第二个贡献,我们推导了motifPOIMs的凸形式,从而能够从gPOIMs中更可靠地提取基序。实证评估证实了我们的方法在人工生成数据以及实际数据集上的有效性。