重复基因组序列的超高效标记

ULTRA-Effective Labeling of Repetitive Genomic Sequence.

作者信息

Olson Daniel R, Wheeler Travis J

机构信息

Department of Computer Science, University of Montana, Missoula, MT, USA.

R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ, USA.

出版信息

bioRxiv. 2024 Jun 4:2024.06.03.597269. doi: 10.1101/2024.06.03.597269.

DOI:10.1101/2024.06.03.597269

PMID:38895435

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11185745/

Abstract

In the age of long read sequencing, genomics researchers now have access to accurate repetitive DNA sequence (including satellites) that, due to the limitations of short read sequencing, could previously be observed only as unmappable fragments. Tools that annotate repetitive sequence are now more important than ever, so that we can better understand newly uncovered repetitive sequences, and also so that we can mitigate errors in bioinformatic software caused by those repetitive sequences. To that end, we introduce the 1.0 release of our tool for identifying and annotating locally-repetitive sequence, (LTRA ocates andemly epetitive reas). is fast enough to use as part of an efficient annotation pipeline, produces state-of-the-art reliable coverage of repetitive regions containing many mutations, and provides interpretable statistics and labels for repetitive regions. It released under an open license, and available for download at https://github.com/TravisWheelerLab/ULTRA.

摘要

在长读长测序时代，基因组学研究人员现在能够获得准确的重复DNA序列（包括卫星序列），由于短读长测序的局限性，这些序列以前只能被视为无法映射的片段。注释重复序列的工具现在比以往任何时候都更加重要，这样我们就能更好地理解新发现的重复序列，同时也能减少由这些重复序列导致的生物信息软件中的错误。为此，我们推出了用于识别和注释局部重复序列的工具1.0版本（LTRA定位并注释重复区域）。LTRA速度足够快，可以作为高效注释流程的一部分使用，能对包含许多突变的重复区域产生最先进的可靠覆盖，并为重复区域提供可解释的统计信息和标签。它在开放许可下发布，可在https://github.com/TravisWheelerLab/ULTRA上下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/366b/11185745/495032691063/nihpp-2024.06.03.597269v1-f0001.jpg

相似文献

ULTRA-Effective Labeling of Repetitive Genomic Sequence.重复基因组序列的超高效标记

bioRxiv. 2024 Jun 4:2024.06.03.597269. doi: 10.1101/2024.06.03.597269.

ULTRA-effective labeling of tandem repeats in genomic sequence.基因组序列中串联重复序列的超高效标记

Bioinform Adv. 2024 Oct 9;4(1):vbae149. doi: 10.1093/bioadv/vbae149. eCollection 2024.

nail: software for high-speed, high-sensitivity protein sequence annotation.NAIL：用于高速、高灵敏度蛋白质序列注释的软件。

bioRxiv. 2024 Jan 30:2024.01.27.577580. doi: 10.1101/2024.01.27.577580.

SAKit: An all-in-one analysis pipeline for identifying novel proteins resulting from variant events at both large and small scales.SAKit：一种用于鉴定由大尺度和小尺度变异事件产生的新型蛋白质的一体化分析管道。

J Bioinform Comput Biol. 2024 Oct;22(5):2450022. doi: 10.1142/S0219720024500227. Epub 2024 Oct 1.

Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data.Gencore：一种高效的工具，用于生成共识读数，以抑制 NGS 数据的错误并去除重复。

BMC Bioinformatics. 2019 Dec 27;20(Suppl 23):606. doi: 10.1186/s12859-019-3280-9.

Sensitive and error-tolerant annotation of protein-coding DNA with BATH.利用BATH对蛋白质编码DNA进行灵敏且容错的注释。

Bioinform Adv. 2024 Jun 14;4(1):vbae088. doi: 10.1093/bioadv/vbae088. eCollection 2024.

Accurately estimating the length distributions of genomic micro-satellites by tumor purity deconvolution.通过肿瘤纯度反卷积准确估计基因组微卫星的长度分布。

BMC Bioinformatics. 2020 Mar 11;21(Suppl 2):82. doi: 10.1186/s12859-020-3349-5.

nf-core/pacvar: a pipeline for analyzing long-read PacBio whole genome and repeat expansion sequencing data.nf-core/pacvar：一个用于分析长读长PacBio全基因组和重复序列扩增测序数据的流程。

Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf116.

Beav: a bacterial genome and mobile element annotation pipeline.Beav：细菌基因组和移动元件注释流水线。

mSphere. 2024 Aug 28;9(8):e0020924. doi: 10.1128/msphere.00209-24. Epub 2024 Jul 22.

BleTIES: annotation of natural genome editing in ciliates using long read sequencing.BleTIES：使用长读测序对纤毛虫中的自然基因组编辑进行注释。

Bioinformatics. 2021 Nov 5;37(21):3929-3931. doi: 10.1093/bioinformatics/btab613.

本文引用的文献

Transposable Elements as a Source of Novel Repetitive DNA in the Eukaryote Genome.转座元件作为真核生物基因组中新的重复 DNA 的来源。

Cells. 2022 Oct 26;11(21):3373. doi: 10.3390/cells11213373.

The complete sequence of a human genome.人类基因组的完整序列。

Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987. Epub 2022 Mar 31.

Complete genomic and epigenetic maps of human centromeres.人类着丝粒的完整基因组和表观基因组图谱。

Science. 2022 Apr;376(6588):eabl4178. doi: 10.1126/science.abl4178. Epub 2022 Apr 1.

Emerging Roles of Repetitive and Repeat-Containing RNA in Nuclear and Chromatin Organization and Gene Expression.重复及含重复序列的RNA在细胞核与染色质组织及基因表达中的新作用

Front Cell Dev Biol. 2021 Oct 6;9:735527. doi: 10.3389/fcell.2021.735527. eCollection 2021.

De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes.从头组装、注释和 26 个不同玉米基因组的比较分析。

Science. 2021 Aug 6;373(6555):655-662. doi: 10.1126/science.abg5289.

Relatively semi-conservative replication and a folded slippage model for short tandem repeats.短串联重复序列的相对半保守复制和折叠滑动模型

BMC Genomics. 2020 Aug 17;21(1):563. doi: 10.1186/s12864-020-06949-5.

Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.精确的圆形共识长读测序提高了人类基因组变异检测和组装的准确性。

Nat Biotechnol. 2019 Oct;37(10):1155-1162. doi: 10.1038/s41587-019-0217-9. Epub 2019 Aug 12.

ULTRA: A Model Based Tool to Detect Tandem Repeats.ULTRA：一种基于模型的串联重复序列检测工具。

ACM BCB. 2018 Aug-Sep;2018:37-46. doi: 10.1145/3233547.3233604.

Tandem repeats mediating genetic plasticity in health and disease.串联重复序列介导健康与疾病中的遗传可塑性。

Nat Rev Genet. 2018 May;19(5):286-298. doi: 10.1038/nrg.2017.115. Epub 2018 Feb 5.

Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes.12632个人类全基因组中短串联重复疾病等位基因的分析

Am J Hum Genet. 2017 Nov 2;101(5):700-715. doi: 10.1016/j.ajhg.2017.09.013.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

重复基因组序列的超高效标记

ULTRA-Effective Labeling of Repetitive Genomic Sequence.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献