基于图卷积网络的半监督学习预测原核病毒宿主。

Predicting the hosts of prokaryotic viruses using GCN-based semi-supervised learning.

机构信息

Electrical Engineering, City University of Hong Kong, Hong Kong, China.

出版信息

BMC Biol. 2021 Nov 24;19(1):250. doi: 10.1186/s12915-021-01180-4.

DOI:10.1186/s12915-021-01180-4

PMID:34819064

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8611875/

Abstract

BACKGROUND

Prokaryotic viruses, which infect bacteria and archaea, are the most abundant and diverse biological entities in the biosphere. To understand their regulatory roles in various ecosystems and to harness the potential of bacteriophages for use in therapy, more knowledge of viral-host relationships is required. High-throughput sequencing and its application to the microbiome have offered new opportunities for computational approaches for predicting which hosts particular viruses can infect. However, there are two main challenges for computational host prediction. First, the empirically known virus-host relationships are very limited. Second, although sequence similarity between viruses and their prokaryote hosts have been used as a major feature for host prediction, the alignment is either missing or ambiguous in many cases. Thus, there is still a need to improve the accuracy of host prediction.

RESULTS

In this work, we present a semi-supervised learning model, named HostG, to conduct host prediction for novel viruses. We construct a knowledge graph by utilizing both virus-virus protein similarity and virus-host DNA sequence similarity. Then graph convolutional network (GCN) is adopted to exploit viruses with or without known hosts in training to enhance the learning ability. During the GCN training, we minimize the expected calibrated error (ECE) to ensure the confidence of the predictions. We tested HostG on both simulated and real sequencing data and compared its performance with other state-of-the-art methods specifically designed for virus host classification (VHM-net, WIsH, PHP, HoPhage, RaFAH, vHULK, and VPF-Class).

CONCLUSION

HostG outperforms other popular methods, demonstrating the efficacy of using a GCN-based semi-supervised learning approach. A particular advantage of HostG is its ability to predict hosts from new taxa.

摘要

背景

感染细菌和古菌的原核病毒是生物圈中最丰富和最多样化的生物实体。为了了解它们在各种生态系统中的调节作用，并利用噬菌体的潜力进行治疗，我们需要更多地了解病毒-宿主关系。高通量测序及其在微生物组中的应用为预测特定病毒可以感染哪些宿主的计算方法提供了新的机会。然而，计算宿主预测存在两个主要挑战。首先，经验上已知的病毒-宿主关系非常有限。其次，尽管病毒与其原核宿主之间的序列相似性已被用作宿主预测的主要特征，但在许多情况下，对齐要么缺失要么模糊。因此，仍然需要提高宿主预测的准确性。

结果

在这项工作中，我们提出了一种半监督学习模型，称为 HostG，用于对新病毒进行宿主预测。我们通过利用病毒-病毒蛋白相似性和病毒-宿主 DNA 序列相似性构建知识图谱。然后采用图卷积网络（GCN）来利用具有或不具有已知宿主的病毒进行训练，以增强学习能力。在 GCN 训练过程中，我们最小化期望校准误差（ECE）以确保预测的置信度。我们在模拟和真实测序数据上测试了 HostG，并将其性能与专门为病毒宿主分类设计的其他最先进的方法（VHM-net、WIsH、PHP、HoPhage、RaFAH、vHULK 和 VPF-Class）进行了比较。

结论

HostG 优于其他流行方法，证明了基于 GCN 的半监督学习方法的有效性。HostG 的一个特别优势是它能够从新的分类单元预测宿主。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/974d/8611875/0509bb089793/12915_2021_1180_Fig1_HTML.jpg

相似文献

Predicting the hosts of prokaryotic viruses using GCN-based semi-supervised learning.基于图卷积网络的半监督学习预测原核病毒宿主。

BMC Biol. 2021 Nov 24;19(1):250. doi: 10.1186/s12915-021-01180-4.

CHERRY: a Computational metHod for accuratE pRediction of virus-pRokarYotic interactions using a graph encoder-decoder model.CHERRY：一种基于图编解码模型的病毒-原核生物相互作用精确预测的计算方法。

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac182.

Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics.原核病毒宿主预测器：一种用于宏基因组中原核病毒宿主预测的高斯模型。

BMC Biol. 2021 Jan 14;19(1):5. doi: 10.1186/s12915-020-00938-6.

Prokaryotic virus host prediction with graph contrastive augmentaion.基于图对比增强的原核病毒宿主预测。

PLoS Comput Biol. 2023 Dec 1;19(12):e1011671. doi: 10.1371/journal.pcbi.1011671. eCollection 2023 Dec.

Prediction of virus-host infectious association by supervised learning methods.通过监督学习方法预测病毒-宿主感染关联。

BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):60. doi: 10.1186/s12859-017-1473-7.

Bacteriophage classification for assembled contigs using graph convolutional network.基于图卷积网络的组装 contig 细菌噬菌体分类。

Bioinformatics. 2021 Jul 12;37(Suppl_1):i25-i33. doi: 10.1093/bioinformatics/btab293.

RaFAH: Host prediction for viruses of Bacteria and Archaea based on protein content.RaFAH：基于蛋白质含量对细菌和古菌病毒进行宿主预测。

Patterns (N Y). 2021 Jun 15;2(7):100274. doi: 10.1016/j.patter.2021.100274. eCollection 2021 Jul 9.

Taxonomy-aware, sequence similarity ranking reliably predicts phage-host relationships.基于分类学的序列相似性排序可可靠地预测噬菌体-宿主关系。

BMC Biol. 2021 Oct 8;19(1):223. doi: 10.1186/s12915-021-01146-6.

Host-Associated Bacteriophage Isolation and Preparation for Viral Metagenomics.宿主相关噬菌体的分离及病毒宏基因组学样本制备

Methods Mol Biol. 2018;1746:1-25. doi: 10.1007/978-1-4939-7683-6_1.

Chemical toxicity prediction based on semi-supervised learning and graph convolutional neural network.基于半监督学习和图卷积神经网络的化学毒性预测

J Cheminform. 2021 Nov 27;13(1):93. doi: 10.1186/s13321-021-00570-8.

引用本文的文献

PHPGAT: predicting phage hosts based on multimodal heterogeneous knowledge graph with graph attention network.PHPGAT：基于具有图注意力网络的多模态异构知识图谱预测噬菌体宿主

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf017.

PHIStruct: improving phage-host interaction prediction at low sequence similarity settings using structure-aware protein embeddings.PHIStruct：使用结构感知蛋白质嵌入在低序列相似性设置下改进噬菌体-宿主相互作用预测。

Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btaf016.

A novel framework for phage-host prediction via logical probability theory and network sparsification.一种基于逻辑概率理论和网络稀疏化的噬菌体-宿主预测新框架。

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae708.

Regional antimicrobial resistance gene flow among the One Health sectors in China.中国“同一健康”各领域间的区域抗菌药物耐药基因流动

Microbiome. 2025 Jan 7;13(1):3. doi: 10.1186/s40168-024-01983-x.

Predicting phage-host interactions via feature augmentation and regional graph convolution.通过特征增强和区域图卷积预测噬菌体-宿主相互作用。

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae672.

Engineering Phages to Fight Multidrug-Resistant Bacteria.改造噬菌体以对抗多重耐药细菌。

Chem Rev. 2025 Jan 22;125(2):933-971. doi: 10.1021/acs.chemrev.4c00681. Epub 2024 Dec 16.

Viromic and Metagenomic Analyses of Commercial Spirulina Fermentations Reveal Remarkable Microbial Diversity.商业螺旋藻发酵的病毒组学和宏基因组分析揭示了惊人的微生物多样性。

Viruses. 2024 Jun 27;16(7):1039. doi: 10.3390/v16071039.

Identification and classification of the genomes of novel microviruses in poultry slaughterhouse.家禽屠宰场中新型微小病毒基因组的鉴定与分类

Front Microbiol. 2024 May 2;15:1393153. doi: 10.3389/fmicb.2024.1393153. eCollection 2024.

A compendium of ruminant gastrointestinal phage genomes revealed a higher proportion of lytic phages than in any other environments.反刍动物胃肠道噬菌体基因组纲要揭示了比其他任何环境更高比例的裂解噬菌体。

Microbiome. 2024 Apr 4;12(1):69. doi: 10.1186/s40168-024-01784-2.

Advances in phage-host interaction prediction: in silico method enhances the development of phage therapies.噬菌体-宿主相互作用预测的进展：计算方法促进噬菌体疗法的发展。

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae117.

本文引用的文献

HoPhage: an ab initio tool for identifying hosts of phage fragments from metaviromes.HoPhage：一种从头开始的工具，用于从宏病毒组中识别噬菌体片段的宿主。

Bioinformatics. 2022 Jan 3;38(2):543-545. doi: 10.1093/bioinformatics/btab585.

RaFAH: Host prediction for viruses of Bacteria and Archaea based on protein content.RaFAH：基于蛋白质含量对细菌和古菌病毒进行宿主预测。

Patterns (N Y). 2021 Jun 15;2(7):100274. doi: 10.1016/j.patter.2021.100274. eCollection 2021 Jul 9.

Bacteriophage classification for assembled contigs using graph convolutional network.基于图卷积网络的组装 contig 细菌噬菌体分类。

Bioinformatics. 2021 Jul 12;37(Suppl_1):i25-i33. doi: 10.1093/bioinformatics/btab293.

Global overview and major challenges of host prediction methods for uncultivated phages.未培养噬菌体宿主预测方法的全球概况和主要挑战。

Curr Opin Virol. 2021 Aug;49:117-126. doi: 10.1016/j.coviro.2021.05.003. Epub 2021 Jun 12.

MDA-GCNFTG: identifying miRNA-disease associations based on graph convolutional networks via graph sampling through the feature and topology graph.MDA-GCNFTG：通过基于图采样的特征和拓扑图的图卷积网络来识别 miRNA-疾病关联。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab165.

Improving protein domain classification for third-generation sequencing reads using deep learning.利用深度学习提高第三代测序读段的蛋白质结构域分类。

BMC Genomics. 2021 Apr 9;22(1):251. doi: 10.1186/s12864-021-07468-7.

VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families.VPF分类：基于病毒蛋白家族对未培养病毒进行分类和宿主预测。

Bioinformatics. 2021 Jul 27;37(13):1805-1813. doi: 10.1093/bioinformatics/btab026.

BMC Biol. 2021 Jan 14;19(1):5. doi: 10.1186/s12915-020-00938-6.

A network-based integrated framework for predicting virus-prokaryote interactions.一种基于网络的预测病毒与原核生物相互作用的综合框架。

NAR Genom Bioinform. 2020 Jun;2(2):lqaa044. doi: 10.1093/nargab/lqaa044. Epub 2020 Jun 23.

DeepLGP: a novel deep learning method for prioritizing lncRNA target genes.DeepLGP：一种用于优先化 lncRNA 靶基因的新型深度学习方法。

Bioinformatics. 2020 Aug 15;36(16):4466-4472. doi: 10.1093/bioinformatics/btaa428.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于图卷积网络的半监督学习预测原核病毒宿主。

Predicting the hosts of prokaryotic viruses using GCN-based semi-supervised learning.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献