Suppr超能文献

斑驳:通过利用短读映射器和梯度下降实现高分歧下精确的双序列替换距离。

Mottle: Accurate pairwise substitution distance at high divergence through the exploitation of short-read mappers and gradient descent.

机构信息

Faculty of Science, Agriculture and Engineering, School of Natural and Environmental Sciences, Newcastle University, United Kingdom.

Fera Ltd., Biotech Campus, York, United Kingdom.

出版信息

PLoS One. 2024 Mar 21;19(3):e0298834. doi: 10.1371/journal.pone.0298834. eCollection 2024.

Abstract

Current tools for estimating the substitution distance between two related sequences struggle to remain accurate at a high divergence. Difficulties at distant homologies, such as false seeding and over-alignment, create a high barrier for the development of a stable estimator. This is especially true for viral genomes, which carry a high rate of mutation, small size, and sparse taxonomy. Developing an accurate substitution distance measure would help to elucidate the relationship between highly divergent sequences, interrogate their evolutionary history, and better facilitate the discovery of new viral genomes. To tackle these problems, we propose an approach that uses short-read mappers to create whole-genome maps, and gradient descent to isolate the homologous fraction and calculate the final distance value. We implement this approach as Mottle. With the use of simulated and biological sequences, Mottle was able to remain stable to 0.66-0.96 substitutions per base pair and identify viral outgroup genomes with 95% accuracy at the family-order level. Our results indicate that Mottle performs as well as existing programs in identifying taxonomic relationships, with more accurate numerical estimation of genomic distance over greater divergences. By contrast, one limitation is a reduced numerical accuracy at low divergences, and on genomes where insertions and deletions are uncommon, when compared to alternative approaches. We propose that Mottle may therefore be of particular interest in the study of viruses, viral relationships, and notably for viral discovery platforms, helping in benchmarking of homology search tools and defining the limits of taxonomic classification methods. The code for Mottle is available at https://github.com/tphoward/Mottle_Repo.

摘要

目前用于估计两个相关序列之间替代距离的工具在高分歧时很难保持准确性。在远距离同源性方面存在困难,例如错误的种子和过度对齐,这为开发稳定的估计器设置了很高的障碍。对于携带高突变率、小尺寸和稀疏分类的病毒基因组来说尤其如此。开发准确的替代距离测量方法将有助于阐明高度差异序列之间的关系,探究它们的进化历史,并更好地促进新病毒基因组的发现。为了解决这些问题,我们提出了一种使用短读映射器创建全基因组图谱的方法,并使用梯度下降来分离同源部分并计算最终的距离值。我们将这种方法实现为 Mottle。使用模拟和生物序列,Mottle 能够在 0.66-0.96 个替换/碱基对的范围内保持稳定,并以 95%的准确率在家族级水平上识别病毒外群基因组。我们的结果表明,Mottle 在识别分类关系方面与现有程序一样出色,在更大的分歧下,对基因组距离的数值估计更准确。相比之下,一个限制是在低分歧下的数值精度降低,并且与替代方法相比,在插入和缺失不常见的基因组中也是如此。我们提出,Mottle 可能特别感兴趣的是病毒研究、病毒关系,特别是对于病毒发现平台,有助于同源搜索工具的基准测试和分类方法的限制定义。Mottle 的代码可在 https://github.com/tphoward/Mottle_Repo 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5564/10956839/397824bcde95/pone.0298834.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验