用于新型病毒基因组预测的可解释深度神经网络。

Explainable deep neural networks for novel viral genome prediction.

作者信息

Dasari Chandra Mohan, Bhukya Raju

机构信息

National Institute of Technology, Warangal, Telangana 506004 India.

出版信息

Appl Intell (Dordr). 2022;52(3):3002-3017. doi: 10.1007/s10489-021-02572-3. Epub 2021 Jun 25.

DOI:10.1007/s10489-021-02572-3

PMID:34764607

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8232563/

Abstract

Viral infection causes a wide variety of human diseases including cancer and COVID-19. Viruses invade host cells and associate with host molecules, potentially disrupting the normal function of hosts that leads to fatal diseases. Novel viral genome prediction is crucial for understanding the complex viral diseases like AIDS and Ebola. While most existing computational techniques classify viral genomes, the efficiency of the classification depends solely on the structural features extracted. The state-of-the-art DNN models achieved excellent performance by automatic extraction of classification features, but the degree of model explainability is relatively poor. During model training for viral prediction, proposed CNN, CNN-LSTM based methods (EdeepVPP, EdeepVPP-hybrid) automatically extracts features. EdeepVPP also performs model interpretability in order to extract the most important patterns that cause viral genomes through learned filters. It is an interpretable CNN model that extracts vital biologically relevant patterns (features) from feature maps of viral sequences. The EdeepVPP-hybrid predictor outperforms all the existing methods by achieving 0.992 mean AUC-ROC and 0.990 AUC-PR on 19 human metagenomic contig experiment datasets using 10-fold cross-validation. We evaluate the ability of CNN filters to detect patterns across high average activation values. To further asses the robustness of EdeepVPP model, we perform leave-one-experiment-out cross-validation. It can work as a recommendation system to further analyze the raw sequences labeled as 'unknown' by alignment-based methods. We show that our interpretable model can extract patterns that are considered to be the most important features for predicting virus sequences through learned filters.

摘要

病毒感染会引发包括癌症和新冠疫情在内的多种人类疾病。病毒侵入宿主细胞并与宿主分子相互作用，这可能会破坏宿主的正常功能，进而导致致命疾病。新型病毒基因组预测对于理解诸如艾滋病和埃博拉等复杂病毒疾病至关重要。虽然现有的大多数计算技术对病毒基因组进行分类，但其分类效率完全取决于所提取的结构特征。最先进的深度神经网络（DNN）模型通过自动提取分类特征取得了优异的性能，但其模型可解释性程度相对较差。在用于病毒预测的模型训练过程中，所提出的基于卷积神经网络（CNN）、卷积神经网络-长短期记忆网络（CNN-LSTM）的方法（EdeepVPP、EdeepVPP-hybrid）会自动提取特征。EdeepVPP还进行模型可解释性分析，以便通过学习到的滤波器提取导致病毒基因组的最重要模式。它是一个可解释的CNN模型，可从病毒序列的特征图中提取重要的生物学相关模式（特征）。在使用10折交叉验证的19个人类宏基因组重叠群实验数据集上，EdeepVPP-hybrid预测器的平均曲线下面积-受试者工作特征曲线（AUC-ROC）为0.992，曲线下面积-精确率-召回率曲线（AUC-PR）为0.990，优于所有现有方法。我们评估了CNN滤波器检测高平均激活值模式的能力。为了进一步评估EdeepVPP模型的稳健性，我们进行了留一实验交叉验证。它可以作为一个推荐系统，以进一步分析基于比对方法标记为“未知”的原始序列。我们表明，我们的可解释模型可以通过学习到的滤波器提取被认为是预测病毒序列最重要特征的模式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/099a/8232563/1bc1b12fbc04/10489_2021_2572_Fig1_HTML.jpg

相似文献

Explainable deep neural networks for novel viral genome prediction.用于新型病毒基因组预测的可解释深度神经网络。

Appl Intell (Dordr). 2022;52(3):3002-3017. doi: 10.1007/s10489-021-02572-3. Epub 2021 Jun 25.

EDeepSSP: Explainable deep neural networks for exact splice sites prediction.EDeepSSP：用于准确剪接位点预测的可解释深度神经网络。

J Bioinform Comput Biol. 2020 Aug;18(4):2050024. doi: 10.1142/S0219720020500249. Epub 2020 Jul 22.

fMRI volume classification using a 3D convolutional neural network robust to shifted and scaled neuronal activations.使用对移位和缩放神经元激活具有鲁棒性的 3D 卷积神经网络进行 fMRI 体积分类。

Neuroimage. 2020 Dec;223:117328. doi: 10.1016/j.neuroimage.2020.117328. Epub 2020 Sep 5.

CEFEs: A CNN Explainable Framework for ECG Signals.CEFEs：用于心电图信号的 CNN 可解释框架。

Artif Intell Med. 2021 May;115:102059. doi: 10.1016/j.artmed.2021.102059. Epub 2021 Mar 26.

Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions.打开黑箱：一种基于可解释深度神经网络的细胞类型特异性增强子预测分类器。

BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):54. doi: 10.1186/s12918-016-0302-3.

Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。

Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.

Deep Neural Network Based Predictions of Protein Interactions Using Primary Sequences.基于深度神经网络的利用原始序列预测蛋白质相互作用。

Molecules. 2018 Aug 1;23(8):1923. doi: 10.3390/molecules23081923.

Knowledge-driven feature component interpretable network for motor imagery classification.基于知识驱动的特征分量可解释网络的运动想象分类。

J Neural Eng. 2022 Feb 18;19(1). doi: 10.1088/1741-2552/ac463a.

SpliceFinder: ab initio prediction of splice sites using convolutional neural network.SpliceFinder：使用卷积神经网络进行剪接位点的从头预测。

BMC Bioinformatics. 2019 Dec 27;20(Suppl 23):652. doi: 10.1186/s12859-019-3306-3.

deepNEC: a novel alignment-free tool for the identification and classification of nitrogen biochemical network-related enzymes using deep learning.深度 NEC：一种新颖的无对齐工具，用于使用深度学习识别和分类与氮生化网络相关的酶。

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac071.

引用本文的文献

Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts.基因组学中的可解释人工智能：基于专家混合模型的转录因子结合位点预测

ArXiv. 2025 Jul 18:arXiv:2507.09754v2.

A review of neural networks for metagenomic binning.宏基因组分箱的神经网络综述。

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf065.

Advancements in Viral Genomics: Gated Recurrent Unit Modeling of SARS-CoV-2, SARS, MERS, and Ebola viruses.病毒基因组学的进展：严重急性呼吸综合征冠状病毒2（SARS-CoV-2）、严重急性呼吸综合征（SARS）、中东呼吸综合征（MERS）和埃博拉病毒的门控循环单元建模

Rev Soc Bras Med Trop. 2025 Feb 7;58:e004012024. doi: 10.1590/0037-8682-0178-2024. eCollection 2025.

VirDetect-AI: a residual and convolutional neural network-based metagenomic tool for eukaryotic viral protein identification.VirDetect-AI：一种基于残差和卷积神经网络的宏基因组工具，用于真核病毒蛋白鉴定。

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf001.

DHFS-ECM: Design of a Dual Heuristic Feature Selection-based Ensemble Classification Model for the Identification of Bamboo Species from Genomic Sequences.DHFS-ECM：基于双重启发式特征选择的集成分类模型设计，用于从基因组序列中识别竹种

Curr Genomics. 2024 May 31;25(3):185-201. doi: 10.2174/0113892029268176240125055419. Epub 2024 Feb 1.

Deep learning guided prediction modeling of dengue virus evolving serotype.深度学习引导的登革热病毒进化血清型预测建模

Heliyon. 2024 May 29;10(11):e32061. doi: 10.1016/j.heliyon.2024.e32061. eCollection 2024 Jun 15.

Heuristic Analysis of Genomic Sequence Processing Models for High Efficiency Prediction: A Statistical Perspective.基于统计视角的基因组序列处理模型启发式分析以实现高效预测

Curr Genomics. 2022 Nov 18;23(5):299-317. doi: 10.2174/1389202923666220927105311.

COVID-19 diagnosis via chest X-ray image classification based on multiscale class residual attention.基于多尺度类残差注意力的胸部 X 射线图像分类进行 COVID-19 诊断。

Comput Biol Med. 2022 Oct;149:106065. doi: 10.1016/j.compbiomed.2022.106065. Epub 2022 Sep 1.

AMAISE: a machine learning approach to index-free sequence enrichment.AMAISE：一种无索引序列富集的机器学习方法。

Commun Biol. 2022 Jun 9;5(1):568. doi: 10.1038/s42003-022-03498-3.

Using amino acids co-occurrence matrices and explainability model to investigate patterns in dengue virus proteins.利用氨基酸共现矩阵和可解释性模型研究登革热病毒蛋白的模式。

BMC Bioinformatics. 2022 Feb 19;23(1):80. doi: 10.1186/s12859-022-04597-y.

本文引用的文献

Explainable Deep Learning Models in Medical Image Analysis.医学图像分析中的可解释深度学习模型

J Imaging. 2020 Jun 20;6(6):52. doi: 10.3390/jimaging6060052.

Identifying viruses from metagenomic data using deep learning.利用深度学习从宏基因组数据中识别病毒。

Quant Biol. 2020 Mar;8(1):64-77. doi: 10.1007/s40484-019-0187-4.

Interpretable detection of novel human viruses from genome sequencing data.从基因组测序数据中对新型人类病毒进行可解释的检测。

NAR Genom Bioinform. 2021 Feb 1;3(1):lqab004. doi: 10.1093/nargab/lqab004. eCollection 2021 Mar.

RNN-VirSeeker: A Deep Learning Method for Identification of Short Viral Sequences From Metagenomes.RNN-VirSeeker：一种从宏基因组中鉴定短病毒序列的深度学习方法。

IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1840-1849. doi: 10.1109/TCBB.2020.3044575. Epub 2022 Jun 3.

ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples.ViraMiner：在原始 DNA 序列上进行深度学习，以鉴定人类样本中的病毒基因组。

PLoS One. 2019 Sep 11;14(9):e0222271. doi: 10.1371/journal.pone.0222271. eCollection 2019.

Machine Learning for detection of viral sequences in human metagenomic datasets.基于机器学习的人类宏基因组数据中病毒序列检测

BMC Bioinformatics. 2018 Sep 24;19(1):336. doi: 10.1186/s12859-018-2340-x.

MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins.MARVEL，一种用于预测宏基因组分箱中噬菌体序列的工具。

Front Genet. 2018 Aug 7;9:304. doi: 10.3389/fgene.2018.00304. eCollection 2018.

Massively Parallel Implementation of Sequence Alignment with Basic Local Alignment Search Tool Using Parallel Computing in Java Library.使用Java库中的并行计算通过基本局部比对搜索工具进行序列比对的大规模并行实现。

J Comput Biol. 2018 Aug;25(8):871-881. doi: 10.1089/cmb.2018.0079. Epub 2018 Jul 13.

SpliceRover: interpretable convolutional neural networks for improved splice site prediction.SpliceRover：用于提高剪接位点预测的可解释卷积神经网络。

Bioinformatics. 2018 Dec 15;34(24):4180-4188. doi: 10.1093/bioinformatics/bty497.

Extension of the viral ecology in humans using viral profile hidden Markov models.利用病毒特征隐藏马尔可夫模型扩展人类病毒生态学研究

PLoS One. 2018 Jan 19;13(1):e0190938. doi: 10.1371/journal.pone.0190938. eCollection 2018.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于新型病毒基因组预测的可解释深度神经网络。

Explainable deep neural networks for novel viral genome prediction.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献