Suppr超能文献

CGRWDL:基于动态语言模型加权混沌博弈表示的病毒无比对系统发育重建方法

CGRWDL: alignment-free phylogeny reconstruction method for viruses based on chaos game representation weighted by dynamical language model.

作者信息

Wang Ting, Yu Zu-Guo, Li Jinyan

机构信息

National Center for Applied Mathematics in Hunan, Xiangtan University, Xiangtan, Hunan, China.

Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan, China.

出版信息

Front Microbiol. 2024 Mar 20;15:1339156. doi: 10.3389/fmicb.2024.1339156. eCollection 2024.

Abstract

Traditional alignment-based methods meet serious challenges in genome sequence comparison and phylogeny reconstruction due to their high computational complexity. Here, we propose a new alignment-free method to analyze the phylogenetic relationships (classification) among species. In our method, the dynamical language (DL) model and the chaos game representation (CGR) method are used to characterize the frequency information and the context information of -mers in a sequence, respectively. Then for each DNA sequence or protein sequence in a dataset, our method converts the sequence into a feature vector that represents the sequence information based on CGR weighted by the DL model to infer phylogenetic relationships. We name our method CGRWDL. Its performance was tested on both DNA and protein sequences of 8 datasets of viruses to construct the phylogenetic trees. We compared the Robinson-Foulds (RF) distance between the phylogenetic tree constructed by CGRWDL and the reference tree by other advanced methods for each dataset. The results show that the phylogenetic trees constructed by CGRWDL can accurately classify the viruses, and the RF scores between the trees and the reference trees are smaller than that with other methods.

摘要

由于计算复杂度高,传统的基于比对的方法在基因组序列比较和系统发育重建中面临严峻挑战。在此,我们提出一种新的无比对方法来分析物种间的系统发育关系(分类)。在我们的方法中,动态语言(DL)模型和混沌游戏表示(CGR)方法分别用于表征序列中 - 聚体的频率信息和上下文信息。然后,对于数据集中的每个DNA序列或蛋白质序列,我们的方法将序列转换为一个特征向量,该向量基于由DL模型加权的CGR来表示序列信息,以推断系统发育关系。我们将我们的方法命名为CGRWDL。在8个病毒数据集的DNA和蛋白质序列上测试了其性能以构建系统发育树。对于每个数据集,我们比较了由CGRWDL构建的系统发育树与其他先进方法构建的参考树之间的罗宾逊 - 福尔兹(RF)距离。结果表明,由CGRWDL构建的系统发育树可以准确地对病毒进行分类,并且这些树与参考树之间的RF分数比其他方法的要小。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c34c/10987876/e06d38f8b96f/fmicb-15-1339156-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验