• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

斑驳:通过利用短读映射器和梯度下降实现高分歧下精确的双序列替换距离。

Mottle: Accurate pairwise substitution distance at high divergence through the exploitation of short-read mappers and gradient descent.

机构信息

Faculty of Science, Agriculture and Engineering, School of Natural and Environmental Sciences, Newcastle University, United Kingdom.

Fera Ltd., Biotech Campus, York, United Kingdom.

出版信息

PLoS One. 2024 Mar 21;19(3):e0298834. doi: 10.1371/journal.pone.0298834. eCollection 2024.

DOI:10.1371/journal.pone.0298834
PMID:38512939
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10956839/
Abstract

Current tools for estimating the substitution distance between two related sequences struggle to remain accurate at a high divergence. Difficulties at distant homologies, such as false seeding and over-alignment, create a high barrier for the development of a stable estimator. This is especially true for viral genomes, which carry a high rate of mutation, small size, and sparse taxonomy. Developing an accurate substitution distance measure would help to elucidate the relationship between highly divergent sequences, interrogate their evolutionary history, and better facilitate the discovery of new viral genomes. To tackle these problems, we propose an approach that uses short-read mappers to create whole-genome maps, and gradient descent to isolate the homologous fraction and calculate the final distance value. We implement this approach as Mottle. With the use of simulated and biological sequences, Mottle was able to remain stable to 0.66-0.96 substitutions per base pair and identify viral outgroup genomes with 95% accuracy at the family-order level. Our results indicate that Mottle performs as well as existing programs in identifying taxonomic relationships, with more accurate numerical estimation of genomic distance over greater divergences. By contrast, one limitation is a reduced numerical accuracy at low divergences, and on genomes where insertions and deletions are uncommon, when compared to alternative approaches. We propose that Mottle may therefore be of particular interest in the study of viruses, viral relationships, and notably for viral discovery platforms, helping in benchmarking of homology search tools and defining the limits of taxonomic classification methods. The code for Mottle is available at https://github.com/tphoward/Mottle_Repo.

摘要

目前用于估计两个相关序列之间替代距离的工具在高分歧时很难保持准确性。在远距离同源性方面存在困难,例如错误的种子和过度对齐,这为开发稳定的估计器设置了很高的障碍。对于携带高突变率、小尺寸和稀疏分类的病毒基因组来说尤其如此。开发准确的替代距离测量方法将有助于阐明高度差异序列之间的关系,探究它们的进化历史,并更好地促进新病毒基因组的发现。为了解决这些问题,我们提出了一种使用短读映射器创建全基因组图谱的方法,并使用梯度下降来分离同源部分并计算最终的距离值。我们将这种方法实现为 Mottle。使用模拟和生物序列,Mottle 能够在 0.66-0.96 个替换/碱基对的范围内保持稳定,并以 95%的准确率在家族级水平上识别病毒外群基因组。我们的结果表明,Mottle 在识别分类关系方面与现有程序一样出色,在更大的分歧下,对基因组距离的数值估计更准确。相比之下,一个限制是在低分歧下的数值精度降低,并且与替代方法相比,在插入和缺失不常见的基因组中也是如此。我们提出,Mottle 可能特别感兴趣的是病毒研究、病毒关系,特别是对于病毒发现平台,有助于同源搜索工具的基准测试和分类方法的限制定义。Mottle 的代码可在 https://github.com/tphoward/Mottle_Repo 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5564/10956839/b7a1895739fd/pone.0298834.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5564/10956839/397824bcde95/pone.0298834.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5564/10956839/3e0dbf737c05/pone.0298834.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5564/10956839/ea8b74238aa1/pone.0298834.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5564/10956839/b7a1895739fd/pone.0298834.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5564/10956839/397824bcde95/pone.0298834.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5564/10956839/3e0dbf737c05/pone.0298834.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5564/10956839/ea8b74238aa1/pone.0298834.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5564/10956839/b7a1895739fd/pone.0298834.g004.jpg

相似文献

1
Mottle: Accurate pairwise substitution distance at high divergence through the exploitation of short-read mappers and gradient descent.斑驳:通过利用短读映射器和梯度下降实现高分歧下精确的双序列替换距离。
PLoS One. 2024 Mar 21;19(3):e0298834. doi: 10.1371/journal.pone.0298834. eCollection 2024.
2
Comparison of different assembly and annotation tools on analysis of simulated viral metagenomic communities in the gut.比较不同的组装和注释工具在分析肠道中模拟病毒宏基因组群落中的应用。
BMC Genomics. 2014 Jan 18;15:37. doi: 10.1186/1471-2164-15-37.
3
Minimap2: pairwise alignment for nucleotide sequences.Minimap2:核苷酸序列的两两比对。
Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.
4
Discovery of tandem and interspersed segmental duplications using high-throughput sequencing.利用高通量测序发现串联和散在的片段重复。
Bioinformatics. 2019 Oct 15;35(20):3923-3930. doi: 10.1093/bioinformatics/btz237.
5
Phylogenetic Tree Estimation With and Without Alignment: New Distance Methods and Benchmarking.有比对和无比对情况下的系统发育树估计:新的距离方法与基准测试
Syst Biol. 2017 Mar 1;66(2):218-231. doi: 10.1093/sysbio/syw074.
6
ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels.ML-DSP:利用数字信号处理进行机器学习,实现了在所有分类学水平上的超快、准确和可扩展的基因组分类。
BMC Genomics. 2019 Apr 3;20(1):267. doi: 10.1186/s12864-019-5571-y.
7
Scoredist: a simple and robust protein sequence distance estimator.Scoredist:一种简单且强大的蛋白质序列距离估计器。
BMC Bioinformatics. 2005 Apr 27;6:108. doi: 10.1186/1471-2105-6-108.
8
Cataloguing the taxonomic origins of sequences from a heterogeneous sample using phylogenomics: applications in adventitious agent detection.利用系统发育基因组学对异质样本中序列的分类学起源进行编目:在检测外来因子中的应用。
PDA J Pharm Sci Technol. 2014 Nov-Dec;68(6):602-18. doi: 10.5731/pdajpst.2014.01023.
9
andi: fast and accurate estimation of evolutionary distances between closely related genomes.安迪:快速准确地估计密切相关基因组之间的进化距离。
Bioinformatics. 2015 Apr 15;31(8):1169-75. doi: 10.1093/bioinformatics/btu815. Epub 2014 Dec 10.
10
Screening synteny blocks in pairwise genome comparisons through integer programming.通过整数规划在成对基因组比较中筛选同线性块。
BMC Bioinformatics. 2011 Apr 18;12:102. doi: 10.1186/1471-2105-12-102.

本文引用的文献

1
The RNA virosphere: How big and diverse is it?RNA病毒圈:它有多大,有多多样?
Environ Microbiol. 2023 Jan;25(1):209-215. doi: 10.1111/1462-2920.16312. Epub 2022 Dec 28.
2
Unraveling the viral dark matter through viral metagenomics.通过病毒宏基因组学揭示病毒暗物质。
Front Immunol. 2022 Sep 16;13:1005107. doi: 10.3389/fimmu.2022.1005107. eCollection 2022.
3
Expansion of the global RNA virome reveals diverse clades of bacteriophages.全球 RNA 病毒组的扩展揭示了噬菌体的多样分支。
Cell. 2022 Oct 13;185(21):4023-4037.e18. doi: 10.1016/j.cell.2022.08.023. Epub 2022 Sep 28.
4
The global virome: How much diversity and how many independent origins?全球病毒组:有多少多样性以及多少独立起源?
Environ Microbiol. 2023 Jan;25(1):40-44. doi: 10.1111/1462-2920.16207. Epub 2022 Sep 29.
5
A fast and efficient algorithm for DNA sequence similarity identification.一种用于DNA序列相似性识别的快速高效算法。
Complex Intell Systems. 2023;9(2):1265-1280. doi: 10.1007/s40747-022-00846-y. Epub 2022 Aug 23.
6
Recent progress on methods for estimating and updating large phylogenies.关于估计和更新大型系统发育树的方法的最新进展。
Philos Trans R Soc Lond B Biol Sci. 2022 Oct 10;377(1861):20210244. doi: 10.1098/rstb.2021.0244. Epub 2022 Aug 22.
7
: rapid alignment-free prediction of sequence alignment identity scores using self-supervised general linear models.使用自监督通用线性模型快速无比对预测序列比对同一性得分
NAR Genom Bioinform. 2021 Feb 1;3(1):lqab001. doi: 10.1093/nargab/lqab001. eCollection 2021 Mar.
8
On the transformation of MinHash-based uncorrected distances into proper evolutionary distances for phylogenetic inference.基于 MinHash 的未校正距离向用于系统发育推断的恰当进化距离的转化。
F1000Res. 2020 Nov 10;9:1309. doi: 10.12688/f1000research.26930.1. eCollection 2020.
9
Rfam 14: expanded coverage of metagenomic, viral and microRNA families.Rfam 14:扩展了对宏基因组、病毒和 miRNA 家族的覆盖范围。
Nucleic Acids Res. 2021 Jan 8;49(D1):D192-D200. doi: 10.1093/nar/gkaa1047.
10
ViralMSA: massively scalable reference-guided multiple sequence alignment of viral genomes.病毒 MSA:大规模可扩展的基于参考的病毒基因组多重序列比对。
Bioinformatics. 2021 May 5;37(5):714-716. doi: 10.1093/bioinformatics/btaa743.