Suppr超能文献

使用氨基酸残基的序列环境从未对齐的蛋白质组中进行系统发育分析。

Phylogenies from unaligned proteomes using sequence environments of amino acid residues.

机构信息

Department of Molecular Biology and Biochemistry, University of Málaga, 29071, Málaga, Spain.

出版信息

Sci Rep. 2022 May 6;12(1):7497. doi: 10.1038/s41598-022-11370-x.

Abstract

Alignment-free methods for sequence comparison and phylogeny inference have attracted a great deal of attention in recent years. Several algorithms have been implemented in diverse software packages. Despite the great number of existing methods, most of them are based on word statistics. Although they propose different filtering and weighting strategies and explore different metrics, their performance may be limited by the phylogenetic signal preserved in these words. Herein, we present a different approach based on the species-specific amino acid neighborhood preferences. These differential preferences can be assessed in the context of vector spaces. In this way, a distance-based method to build phylogenies has been developed and implemented into an easy-to-use R package. Tests run on real-world datasets show that this method can reconstruct phylogenetic relationships with high accuracy, and often outperforms other alignment-free approaches. Furthermore, we present evidence that the new method can perform reliably on datasets formed by non-orthologous protein sequences, that is, the method not only does not require the identification of orthologous proteins, but also does not require their presence in the analyzed dataset. These results suggest that the neighborhood preference of amino acids conveys a phylogenetic signal that may be of great utility in phylogenomics.

摘要

近年来,无比对方法在序列比较和系统发育推断方面引起了广泛关注。几种算法已经在不同的软件包中实现。尽管存在大量的现有方法,但它们大多基于单词统计。尽管它们提出了不同的过滤和加权策略,并探索了不同的指标,但它们的性能可能受到这些单词中保留的系统发育信号的限制。在此,我们提出了一种基于物种特异性氨基酸邻域偏好的不同方法。这些差异偏好可以在向量空间的背景下进行评估。以这种方式,开发了一种基于距离的构建系统发育的方法,并将其实现到一个易于使用的 R 包中。在真实数据集上的测试表明,该方法可以高精度地重建系统发育关系,并且通常优于其他无比对方法。此外,我们提供的证据表明,新方法可以在由非同源蛋白序列组成的数据集上可靠地执行,即该方法不仅不需要识别同源蛋白,而且也不需要它们存在于分析的数据集。这些结果表明,氨基酸的邻域偏好传递了一种可能在系统发生组学中非常有用的系统发育信号。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c24/9076898/e6774899c7ff/41598_2022_11370_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验