Suppr超能文献

错误分类:使用深度神经网络识别甲型流感病毒人畜共患病转变生物标志物候选物

Misclassified: identification of zoonotic transition biomarker candidates for influenza A viruses using deep neural network.

作者信息

Hatibi Nissrine, Dumont-Lagacé Maude, Alouani Zakaria, El Fatimy Rachid, Abik Mounia, Daouda Tariq

机构信息

Ecole Nationale Supérieure d'Informatique et d'Analyse des Systèmes, Mohammed V University in Rabat, Rabat, Morocco.

Institute of Biological Sciences (ISSB), UM6P Faculty of Medical Sciences, Mohammed VI Polytechnic University, Ben Guerir, Morocco.

出版信息

Front Genet. 2023 Jul 27;14:1145166. doi: 10.3389/fgene.2023.1145166. eCollection 2023.

Abstract

Zoonotic transition of Influenza A viruses is the cause of epidemics with high rates of morbidity and mortality. Predicting which viral strains are likely to transition from their genetic sequence could help in the prevention and response against these zoonotic strains. We hypothesized that features predictive of viral hosts could be leveraged to identify biomarkers of zoonotic viral transition. We trained deep learning models to predict viral hosts based on the virus mRNA or protein sequences. Our multi-host dataset contained 848,630 unique nucleotide sequences obtained from the NCBI Influenza Virus and Influenza Research Databases. Each sequence, representing one gene from one viral strain, was classified into one of the three host categories: Avian, Human, and Swine. Trained models were analyzed using various neural network interpretation methods to identify interesting candidates for zoonotic transition biomarkers. Using mRNA sequences as input led to higher prediction accuracies than amino acids, suggesting that the codon sequence contains information relevant to viral hosts that is lost during protein translation. UMAP visualization of the latent space of our classifiers showed that viral sequences clustered according to their host of origin. Interestingly, sequences from pandemic zoonotic viral strains localized at the margins between hosts, while zoonotic sequences incapable of Human-to-Human transmission localized with non-zoonotic viruses from the same host. In addition, host prediction for pandemic zoonotic sequences had low prediction accuracy, which was not the case for the other zoonotic strains. This supports our hypothesis that ambiguously predicted viral sequences bear features associated with cross-species infectivity. Finally, we compared misclassified sequences to well-classified ones to extract interesting candidates for zoonotic transition biomarkers. While features varied significantly between pairs of species and viral genes, several codons were conserved in Swine-to-Human and Avian-to-Human misclassified sequences, and in particular in the NA, HA, and NP genes, suggesting their importance for zoonosis in Humans. Analysis of viral sequences using neural network interpretation approaches revealed important genetic differences between zoonotic viruses with pandemic potential, compared to non-zoonotic viral strains or zoonotic viruses incapable of Human-to-Human transmission.

摘要

甲型流感病毒的人畜共患病转变是导致高发病率和死亡率疫情的原因。根据病毒的基因序列预测哪些病毒株可能发生转变,有助于预防和应对这些人畜共患病株。我们假设可以利用预测病毒宿主的特征来识别跨物种传播病毒转变的生物标志物。我们训练了深度学习模型,根据病毒的mRNA或蛋白质序列预测病毒宿主。我们的多宿主数据集包含从NCBI流感病毒和流感研究数据库中获得的848,630个独特核苷酸序列。每个序列代表一个病毒株的一个基因,被分类为三个宿主类别之一:禽类、人类和猪。使用各种神经网络解释方法对训练好的模型进行分析,以识别跨物种传播病毒转变生物标志物的有趣候选者。使用mRNA序列作为输入比使用氨基酸序列能获得更高的预测准确率,这表明密码子序列包含与病毒宿主相关的信息,而这些信息在蛋白质翻译过程中会丢失。我们分类器潜在空间的UMAP可视化显示,病毒序列根据其起源宿主聚类。有趣的是,大流行人畜共患病病毒株的序列位于宿主之间的边缘,而无法在人际间传播的人畜共患病序列则与来自同一宿主的非人畜共患病病毒聚集在一起。此外,对大流行人畜共患病序列的宿主预测准确率较低,而其他人畜共患病株则并非如此。这支持了我们的假设,即预测模糊的病毒序列具有与跨物种感染性相关的特征。最后,我们将错误分类的序列与正确分类的序列进行比较,以提取跨物种传播病毒转变生物标志物的有趣候选者。虽然不同物种对和病毒基因之间的特征差异很大,但在猪传人、禽传人错误分类的序列中,特别是在NA、HA和NP基因中,有几个密码子是保守的,这表明它们对人类人畜共患病的重要性。与非人畜共患病病毒株或无法在人际间传播的人畜共患病病毒相比,使用神经网络解释方法分析病毒序列揭示了具有大流行潜力的人畜共患病病毒之间重要的基因差异。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e31/10415530/f91e1bd8ed5f/fgene-14-1145166-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验