Suppr超能文献

REPdenovo:从短序列读取中推断从头重复基序

REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.

作者信息

Chu Chong, Nielsen Rasmus, Wu Yufeng

机构信息

Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, United States of America.

Department of Integrative Biology, University of California, Berkeley, CA 94720, United States of America.

出版信息

PLoS One. 2016 Mar 15;11(3):e0150719. doi: 10.1371/journal.pone.0150719. eCollection 2016.

Abstract

Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo.

摘要

重复元件是真核生物基因组的重要组成部分。我们对重复元件理解的一个局限性在于,大多数分析依赖于不完整的参考基因组,这些基因组在难以组装的高度重复区域往往包含缺失数据。为克服这一问题,我们开发了一种新方法REPdenovo,它可直接从鸟枪法测序原始数据中组装重复序列。REPdenovo能够构建各种高度重复且拷贝内序列差异低的重复类型。我们表明,REPdenovo在其恢复的重复序列数量和完整性方面均显著优于现有方法。REPdenovo的关键优势在于它能够从序列读数中重建长重复序列。我们将该方法应用于人类数据,并发现了一些先前重复注释遗漏的潜在新重复序列。其中许多序列被纳入各种寄生虫基因组,这可能是因为寄生虫基因组测序中涉及的宿主DNA过滤过程未能排除宿主衍生的重复序列。REPdenovo是一种用于注释基因组以及解决有关重复家族进化问题的强大新计算工具。软件工具REPdenovo可在https://github.com/Reedwarbler/REPdenovo上下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/491b/4792456/849cdc71213c/pone.0150719.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验