ML2基序——从学习机器中可靠提取判别性序列基序

ML2Motif-Reliable extraction of discriminative sequence motifs from learning machines.

作者信息

Vidovic Marina M-C, Kloft Marius, Müller Klaus-Robert, Görnitz Nico

机构信息

Machine Learning Group, Technical University of Berlin, Berlin, Germany.

Department of Computer Science, Humboldt University of Berlin, Berlin, Germany.

出版信息

PLoS One. 2017 Mar 27;12(3):e0174392. doi: 10.1371/journal.pone.0174392. eCollection 2017.

DOI:10.1371/journal.pone.0174392

PMID:28346487

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5367830/

Abstract

High prediction accuracies are not the only objective to consider when solving problems using machine learning. Instead, particular scientific applications require some explanation of the learned prediction function. For computational biology, positional oligomer importance matrices (POIMs) have been successfully applied to explain the decision of support vector machines (SVMs) using weighted-degree (WD) kernels. To extract relevant biological motifs from POIMs, the motifPOIM method has been devised and showed promising results on real-world data. Our contribution in this paper is twofold: as an extension to POIMs, we propose gPOIM, a general measure of feature importance for arbitrary learning machines and feature sets (including, but not limited to, SVMs and CNNs) and devise a sampling strategy for efficient computation. As a second contribution, we derive a convex formulation of motifPOIMs that leads to more reliable motif extraction from gPOIMs. Empirical evaluations confirm the usefulness of our approach on artificially generated data as well as on real-world datasets.

摘要

在使用机器学习解决问题时，高预测准确率并非唯一需要考虑的目标。相反，特定的科学应用需要对学习到的预测函数进行一些解释。对于计算生物学而言，位置寡聚物重要性矩阵（POIMs）已成功应用于使用加权度（WD）核来解释支持向量机（SVMs）的决策。为了从POIMs中提取相关的生物学基序，已经设计了motifPOIM方法，并且在实际数据上显示出了有前景的结果。我们在本文中的贡献有两个方面：作为对POIMs的扩展，我们提出了gPOIM，这是一种针对任意学习机器和特征集（包括但不限于SVMs和CNNs）的特征重要性的通用度量，并设计了一种用于高效计算的采样策略。作为第二个贡献，我们推导了motifPOIMs的凸形式，从而能够从gPOIMs中更可靠地提取基序。实证评估证实了我们的方法在人工生成数据以及实际数据集上的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b80/5367830/3f44e2e9beb2/pone.0174392.g001.jpg

相似文献

ML2Motif-Reliable extraction of discriminative sequence motifs from learning machines.

PLoS One. 2017 Mar 27;12(3):e0174392. doi: 10.1371/journal.pone.0174392. eCollection 2017.

SVM2Motif--Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor.

PLoS One. 2015 Dec 21;10(12):e0144782. doi: 10.1371/journal.pone.0144782. eCollection 2015.

POIMs: positional oligomer importance matrices--understanding support vector machine-based signal detectors.

Bioinformatics. 2008 Jul 1;24(13):i6-14. doi: 10.1093/bioinformatics/btn170.

Learning interpretable SVMs for biological sequence classification.

BMC Bioinformatics. 2006 Mar 20;7 Suppl 1(Suppl 1):S9. doi: 10.1186/1471-2105-7-S1-S9.

Correlation kernels for support vector machines classification with applications in cancer data.

Comput Math Methods Med. 2012;2012:205025. doi: 10.1155/2012/205025. Epub 2012 Aug 7.

DiscMLA: An Efficient Discriminative Motif Learning Algorithm over High-Throughput Datasets.

IEEE/ACM Trans Comput Biol Bioinform. 2018 Nov-Dec;15(6):1810-1820. doi: 10.1109/TCBB.2016.2561930. Epub 2016 May 3.

Gene Classification Based on Multi-Class SVMs with Systematic Sampling and Hierarchical Clustering (SSHC) Algorithm.

Adv Exp Med Biol. 2021;1338:231-237. doi: 10.1007/978-3-030-78775-2_28.

Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites.

BMC Bioinformatics. 2004 Oct 28;5:169. doi: 10.1186/1471-2105-5-169.

The feature selection bias problem in relation to high-dimensional gene data.

Artif Intell Med. 2016 Jan;66:63-71. doi: 10.1016/j.artmed.2015.11.001. Epub 2015 Nov 14.

MOCCA: a flexible suite for modelling DNA sequence motif occurrence combinatorics.

BMC Bioinformatics. 2021 May 7;22(1):234. doi: 10.1186/s12859-021-04143-2.

引用本文的文献

Investigation of the Solubility of Elemental Sulfur (S) in Sulfur-Containing Natural Gas with Machine Learning Methods.

Int J Environ Res Public Health. 2023 Mar 13;20(6):5059. doi: 10.3390/ijerph20065059.

Interpretable machine learning for genomics.

Hum Genet. 2022 Sep;141(9):1499-1513. doi: 10.1007/s00439-021-02387-9. Epub 2021 Oct 20.

本文引用的文献

Quantum-chemical insights from deep tensor neural networks.

Nat Commun. 2017 Jan 9;8:13890. doi: 10.1038/ncomms13890.

iRSpot-EL: identify recombination spots with an ensemble learning approach.

Bioinformatics. 2017 Jan 1;33(1):35-41. doi: 10.1093/bioinformatics/btw539. Epub 2016 Aug 16.

SVM2Motif--Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor.

PLoS One. 2015 Dec 21;10(12):e0144782. doi: 10.1371/journal.pone.0144782. eCollection 2015.

JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.

Nucleic Acids Res. 2016 Jan 4;44(D1):D110-5. doi: 10.1093/nar/gkv1176. Epub 2015 Nov 3.

Improving the Robustness of Myoelectric Pattern Recognition for Upper Limb Prostheses by Covariate Shift Adaptation.

IEEE Trans Neural Syst Rehabil Eng. 2016 Sep;24(9):961-970. doi: 10.1109/TNSRE.2015.2492619. Epub 2015 Oct 26.

iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition.

Bioinformatics. 2016 Feb 1;32(3):362-9. doi: 10.1093/bioinformatics/btv604. Epub 2015 Oct 17.

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning.

Nat Biotechnol. 2015 Aug;33(8):831-8. doi: 10.1038/nbt.3300. Epub 2015 Jul 27.

On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation.

PLoS One. 2015 Jul 10;10(7):e0130140. doi: 10.1371/journal.pone.0130140. eCollection 2015.

Extracting latent brain states--Towards true labels in cognitive neuroscience experiments.

Neuroimage. 2015 Oct 15;120:225-53. doi: 10.1016/j.neuroimage.2015.05.078. Epub 2015 Jun 9.

Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences.

Nucleic Acids Res. 2015 Jul 1;43(W1):W65-71. doi: 10.1093/nar/gkv458. Epub 2015 May 9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ML2基序——从学习机器中可靠提取判别性序列基序

ML2Motif-Reliable extraction of discriminative sequence motifs from learning machines.

作者信息

Vidovic Marina M-C, Kloft Marius, Müller Klaus-Robert, Görnitz Nico

机构信息

Machine Learning Group, Technical University of Berlin, Berlin, Germany.

Department of Computer Science, Humboldt University of Berlin, Berlin, Germany.

出版信息

PLoS One. 2017 Mar 27;12(3):e0174392. doi: 10.1371/journal.pone.0174392. eCollection 2017.

DOI:10.1371/journal.pone.0174392

PMID:28346487

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5367830/

Abstract

摘要

ML2基序——从学习机器中可靠提取判别性序列基序

ML2Motif-Reliable extraction of discriminative sequence motifs from learning machines.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

ML2基序——从学习机器中可靠提取判别性序列基序

ML2Motif-Reliable extraction of discriminative sequence motifs from learning machines.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献