Suppr超能文献

重复基因组序列的超高效标记

ULTRA-Effective Labeling of Repetitive Genomic Sequence.

作者信息

Olson Daniel R, Wheeler Travis J

机构信息

Department of Computer Science, University of Montana, Missoula, MT, USA.

R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ, USA.

出版信息

bioRxiv. 2024 Jun 4:2024.06.03.597269. doi: 10.1101/2024.06.03.597269.

Abstract

In the age of long read sequencing, genomics researchers now have access to accurate repetitive DNA sequence (including satellites) that, due to the limitations of short read sequencing, could previously be observed only as unmappable fragments. Tools that annotate repetitive sequence are now more important than ever, so that we can better understand newly uncovered repetitive sequences, and also so that we can mitigate errors in bioinformatic software caused by those repetitive sequences. To that end, we introduce the 1.0 release of our tool for identifying and annotating locally-repetitive sequence, (LTRA ocates andemly epetitive reas). is fast enough to use as part of an efficient annotation pipeline, produces state-of-the-art reliable coverage of repetitive regions containing many mutations, and provides interpretable statistics and labels for repetitive regions. It released under an open license, and available for download at https://github.com/TravisWheelerLab/ULTRA.

摘要

在长读长测序时代,基因组学研究人员现在能够获得准确的重复DNA序列(包括卫星序列),由于短读长测序的局限性,这些序列以前只能被视为无法映射的片段。注释重复序列的工具现在比以往任何时候都更加重要,这样我们就能更好地理解新发现的重复序列,同时也能减少由这些重复序列导致的生物信息软件中的错误。为此,我们推出了用于识别和注释局部重复序列的工具1.0版本(LTRA定位并注释重复区域)。LTRA速度足够快,可以作为高效注释流程的一部分使用,能对包含许多突变的重复区域产生最先进的可靠覆盖,并为重复区域提供可解释的统计信息和标签。它在开放许可下发布,可在https://github.com/TravisWheelerLab/ULTRA上下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/366b/11185745/495032691063/nihpp-2024.06.03.597269v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验