作为可解释机器学习分类器的生物分子序列结构指纹的解析互信息函数

The Resolved Mutual Information Function as a Structural Fingerprint of Biomolecular Sequences for Interpretable Machine Learning Classifiers.

作者信息

Bohnsack Katrin Sophie, Kaden Marika, Abel Julia, Saralajew Sascha, Villmann Thomas

机构信息

Saxon Institute for Computational Intelligence and Machine Learning, University of Applied Sciences Mittweida, 09648 Mittweida, Germany.

Bosch Center for Artificial Intelligence, 71272 Renningen, Germany.

出版信息

Entropy (Basel). 2021 Oct 17;23(10):1357. doi: 10.3390/e23101357.

DOI:10.3390/e23101357

PMID:34682081

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8534762/

Abstract

In the present article we propose the application of variants of the mutual information function as characteristic fingerprints of biomolecular sequences for classification analysis. In particular, we consider the resolved mutual information functions based on Shannon-, Rényi-, and Tsallis-entropy. In combination with interpretable machine learning classifier models based on generalized learning vector quantization, a powerful methodology for sequence classification is achieved which allows substantial knowledge extraction in addition to the high classification ability due to the model-inherent robustness. Any potential (slightly) inferior performance of the used classifier is compensated by the additional knowledge provided by interpretable models. This knowledge may assist the user in the analysis and understanding of the used data and considered task. After theoretical justification of the concepts, we demonstrate the approach for various example data sets covering different areas in biomolecular sequence analysis.

摘要

在本文中，我们提出将互信息函数的变体应用为生物分子序列的特征指纹，用于分类分析。特别地，我们考虑基于香农熵、雷尼熵和Tsallis熵的解析互信息函数。结合基于广义学习向量量化的可解释机器学习分类器模型，实现了一种强大的序列分类方法，该方法除了由于模型固有的稳健性而具有高分类能力外，还允许大量的知识提取。所使用的分类器的任何潜在（轻微）性能劣势都由可解释模型提供的额外知识来弥补。这些知识可以帮助用户分析和理解所使用的数据以及所考虑的任务。在对这些概念进行理论论证之后，我们针对涵盖生物分子序列分析不同领域的各种示例数据集展示了该方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4654/8534762/844fe85e2611/entropy-23-01357-g001.jpg

相似文献

The Resolved Mutual Information Function as a Structural Fingerprint of Biomolecular Sequences for Interpretable Machine Learning Classifiers.

Entropy (Basel). 2021 Oct 17;23(10):1357. doi: 10.3390/e23101357.

Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences.

Neural Comput Appl. 2022;34(1):67-78. doi: 10.1007/s00521-021-06018-2. Epub 2021 Apr 27.

Information Theory for Biological Sequence Classification: A Novel Feature Extraction Technique Based on Tsallis Entropy.

Entropy (Basel). 2022 Oct 1;24(10):1398. doi: 10.3390/e24101398.

Testing Nonlinearity with Rényi and Tsallis Mutual Information with an Application in the EKC Hypothesis.

Entropy (Basel). 2022 Dec 31;25(1):79. doi: 10.3390/e25010079.

Classifying Cognitive Profiles Using Machine Learning with Privileged Information in Mild Cognitive Impairment.

Front Comput Neurosci. 2016 Nov 17;10:117. doi: 10.3389/fncom.2016.00117. eCollection 2016.

Multiclass EEG signal classification utilizing Rényi min-entropy-based feature selection from wavelet packet transformation.

Brain Inform. 2020 Jun 16;7(1):7. doi: 10.1186/s40708-020-00108-y.

Application of an interpretable classification model on Early Folding Residues during protein folding.

BioData Min. 2019 Jan 5;12:1. doi: 10.1186/s13040-018-0188-2. eCollection 2019.

Probing machine-learning classifiers using noise, bubbles, and reverse correlation.

J Neurosci Methods. 2021 Oct 1;362:109297. doi: 10.1016/j.jneumeth.2021.109297. Epub 2021 Jul 25.

Prioritizing Virtual Screening with Interpretable Interaction Fingerprints.

J Chem Inf Model. 2022 Sep 26;62(18):4300-4318. doi: 10.1021/acs.jcim.2c00695. Epub 2022 Sep 14.

Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations.

Front Mol Biosci. 2019 Jun 11;6:44. doi: 10.3389/fmolb.2019.00044. eCollection 2019.

引用本文的文献

[Exploration of the Predictive Value of Peripheral Blood-related Indicators for EGFR  Mutations and Prognosis in Non-small Cell Lung Cancer Using Machine Learning].

Zhongguo Fei Ai Za Zhi. 2025 Feb 20;28(2):105-113. doi: 10.3779/j.issn.1009-3419.2025.102.05.

本文引用的文献

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.

Nat Mach Intell. 2019 May;1(5):206-215. doi: 10.1038/s42256-019-0048-x. Epub 2019 May 13.

Highly accurate protein structure prediction with AlphaFold.

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences.

Neural Comput Appl. 2022;34(1):67-78. doi: 10.1007/s00521-021-06018-2. Epub 2021 Apr 27.

Information Theory in Computational Biology: Where We Stand Today.

Entropy (Basel). 2020 Jun 6;22(6):627. doi: 10.3390/e22060627.

Phylogenetic network analysis of SARS-CoV-2 genomes.

Proc Natl Acad Sci U S A. 2020 Apr 28;117(17):9241-9243. doi: 10.1073/pnas.2004999117. Epub 2020 Apr 8.

Application of information theory in systems biology.

Biophys Rev. 2020 Apr;12(2):377-384. doi: 10.1007/s12551-020-00665-w. Epub 2020 Mar 6.

A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network.

BMC Bioinformatics. 2019 Sep 13;20(1):469. doi: 10.1186/s12859-019-3039-3.

Three pitfalls to avoid in machine learning.

Nature. 2019 Aug;572(7767):27-29. doi: 10.1038/d41586-019-02307-y.

RAFTSG: an efficient and versatile clustering software to analyses in large protein datasets.

BMC Bioinformatics. 2019 Jul 15;20(1):392. doi: 10.1186/s12859-019-2973-4.

Evolution of biosequence search algorithms: a brief survey.

Bioinformatics. 2019 Oct 1;35(19):3547-3552. doi: 10.1093/bioinformatics/btz272.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

作为可解释机器学习分类器的生物分子序列结构指纹的解析互信息函数

The Resolved Mutual Information Function as a Structural Fingerprint of Biomolecular Sequences for Interpretable Machine Learning Classifiers.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献