Suppr超能文献

串联重复序列的准确检测揭示了生物序列的普遍重用。

Accurate detection of tandem repeats exposes ubiquitous reuse of biological sequences.

作者信息

Cho Shu-Ting, Wright Erik S

机构信息

Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, United States.

The Joint CMU-Pitt PhD Program in Computational Biology, Pittsburgh, PA 15213, United States.

出版信息

Nucleic Acids Res. 2025 Sep 5;53(17). doi: 10.1093/nar/gkaf866.

Abstract

Tandem repetition is one of the major processes underlying genome evolution and phenotypic diversification. While newly formed tandem repeats are often easy to identify, it is more challenging to detect repeat copies as they diverge over evolutionary timescales. Existing programs for finding tandem repeats return markedly different results, and it is unclear which predictions are more correct and how much room remains for improvement. Here, we introduce DetectRepeats, a new method that uses empirical information about structural repeats to improve the accuracy of repeat detection. We show that DetectRepeats advances the state-of-the-art by finding highly divergent repeats with relatively few false positive detections. We apply DetectRepeats to genomes across the tree of life to discover an enrichment of detectable tandem repeats within different genes, genome regions, and taxa. Furthermore, we use phylogenetic reconciliation to determine that some tandem repeats continue to evolve through intra-repeat unit replacement. In this manner, tandem repeats serve as a renewable genetic resource offering a bountiful source of alternative genetic material. Our work unlocks the confident detection of ancient tandem repeats, opening a doorway to future discoveries. DetectRepeats is part of the DECIPHER package for the R programming language and available via Bioconductor.

摘要

串联重复是基因组进化和表型多样化的主要过程之一。虽然新形成的串联重复序列通常很容易识别,但随着重复序列在进化时间尺度上发生分化,检测重复拷贝则更具挑战性。现有的用于查找串联重复序列的程序返回的结果明显不同,目前尚不清楚哪些预测更准确,以及还有多少改进空间。在此,我们介绍了DetectRepeats,这是一种利用有关结构重复序列的经验信息来提高重复序列检测准确性的新方法。我们表明,DetectRepeats通过发现高度分化的重复序列且假阳性检测相对较少,推动了当前技术水平的发展。我们将DetectRepeats应用于生命之树上的各个基因组,以发现不同基因、基因组区域和分类群中可检测到的串联重复序列的富集情况。此外,我们使用系统发育和解来确定一些串联重复序列通过重复单元内替换继续进化。通过这种方式,串联重复序列作为一种可再生的遗传资源,提供了丰富的替代遗传物质来源。我们的工作实现了对古老串联重复序列的可靠检测,为未来的发现打开了一扇门。DetectRepeats是用于R编程语言的DECIPHER软件包的一部分,可通过Bioconductor获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f59d/12418385/5ce099b4c905/gkaf866figgra1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验