Suppr超能文献

VirGrapher:一种基于图的宏基因组长序列病毒识别工具。

VirGrapher: a graph-based viral identifier for long sequences from metagenomes.

机构信息

College of Computer and Control Engineering, Northeast Forestry University, Hexing Road, 150040, Heilongjiang Province, China.

National Institute for Data Science in Health and Medicine, Xiamen University, Xiangannan Road, 361104, Fujian Province, China.

出版信息

Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae036.

Abstract

Viruses are the most abundant biological entities on earth and are important components of microbial communities. A metagenome contains all microorganisms from an environmental sample. Correctly identifying viruses from these mixed sequences is critical in viral analyses. It is common to identify long viral sequences, which has already been passed thought pipelines of assembly and binning. Existing deep learning-based methods divide these long sequences into short subsequences and identify them separately. This makes the relationships between them be omitted, leading to poor performance on identifying long viral sequences. In this paper, VirGrapher is proposed to improve the identification performance of long viral sequences by constructing relationships among short subsequences from long ones. VirGrapher see a long sequence as a graph and uses a Graph Convolutional Network (GCN) model to learn multilayer connections between nodes from sequences after a GCN-based node embedding model. VirGrapher achieves a better AUC value and accuracy on validation set, which is better than three benchmark methods.

摘要

病毒是地球上最丰富的生物实体,也是微生物群落的重要组成部分。宏基因组包含了环境样本中所有的微生物。正确识别这些混合序列中的病毒在病毒分析中至关重要。通常情况下,我们会识别出长的病毒序列,这些序列已经通过组装和分类的流程了。现有的基于深度学习的方法将这些长序列分成短的子序列并分别识别它们。这使得它们之间的关系被忽略了,从而导致在识别长病毒序列时性能不佳。在本文中,提出了 VirGrapher 通过构建长序列中的短序列之间的关系来提高长病毒序列的识别性能。VirGrapher 将长序列视为一个图,并使用图卷积网络(GCN)模型在基于 GCN 的节点嵌入模型之后从序列中学习节点之间的多层连接。VirGrapher 在验证集上实现了更好的 AUC 值和准确率,优于三种基准方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e2b/10859693/11b3b8cc1ab2/bbae036f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验