降噪重复序列发现工具：在易错长读测序数据中发现串联重复序列。

Noise-cancelling repeat finder: uncovering tandem repeats in error-prone long-read sequencing data.

机构信息

Department of Biology, The Pennsylvania State University, State College, PA 16802, USA.

Center for Medical Genomics, The Pennsylvania State University, State College, PA 16802, USA.

出版信息

Bioinformatics. 2019 Nov 1;35(22):4809-4811. doi: 10.1093/bioinformatics/btz484.

DOI:10.1093/bioinformatics/btz484

PMID:31290946

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6853708/

Abstract

SUMMARY

Tandem DNA repeats can be sequenced with long-read technologies, but cannot be accurately deciphered due to the lack of computational tools taking high error rates of these technologies into account. Here we introduce Noise-Cancelling Repeat Finder (NCRF) to uncover putative tandem repeats of specified motifs in noisy long reads produced by Pacific Biosciences and Oxford Nanopore sequencers. Using simulations, we validated the use of NCRF to locate tandem repeats with motifs of various lengths and demonstrated its superior performance as compared to two alternative tools. Using real human whole-genome sequencing data, NCRF identified long arrays of the (AATGG)n repeat involved in heat shock stress response.

AVAILABILITY AND IMPLEMENTATION

NCRF is implemented in C, supported by several python scripts, and is available in bioconda and at https://github.com/makovalab-psu/NoiseCancellingRepeatFinder.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

串联 DNA 重复序列可以使用长读长技术进行测序，但由于缺乏考虑这些技术高错误率的计算工具，因此无法准确破译。在这里，我们介绍了噪声消除重复序列发现工具（Noise-Cancelling Repeat Finder，NCRF），用于在 Pacific Biosciences 和 Oxford Nanopore 测序器产生的嘈杂长读段中发现指定基序的假定串联重复序列。通过模拟，我们验证了 NCRF 用于定位具有各种长度基序的串联重复序列的用途，并证明了它与两种替代工具相比具有更好的性能。使用真实的人类全基因组测序数据，NCRF 鉴定了与热休克应激反应相关的（AATGG）n 重复长阵列。

可用性和实施

NCRF 是用 C 语言实现的，支持几个 Python 脚本，并在 bioconda 和 https://github.com/makovalab-psu/NoiseCancellingRepeatFinder 上提供。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/483f/6853708/c983cca2e0fc/btz484f1.jpg

相似文献

Noise-cancelling repeat finder: uncovering tandem repeats in error-prone long-read sequencing data.降噪重复序列发现工具：在易错长读测序数据中发现串联重复序列。

Bioinformatics. 2019 Nov 1;35(22):4809-4811. doi: 10.1093/bioinformatics/btz484.

LongTR: genome-wide profiling of genetic variation at tandem repeats from long reads.LongTR：从长读段中进行串联重复的全基因组遗传变异分析。

Genome Biol. 2024 Jul 4;25(1):176. doi: 10.1186/s13059-024-03319-2.

TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain.TideHunter：使用种子和链在嘈杂的长读取中高效且敏感的串联重复检测。

Bioinformatics. 2019 Jul 15;35(14):i200-i207. doi: 10.1093/bioinformatics/btz376.

Analysis of Tandem Repeat Expansions Using Long DNA Reads.利用长 DNA 读取分析串联重复扩展。

Methods Mol Biol. 2023;2632:147-159. doi: 10.1007/978-1-0716-2996-3_11.

lordFAST: sensitive and Fast Alignment Search Tool for LOng noisy Read sequencing Data.lordFAST：用于长噪声测序数据的敏感快速比对搜索工具。

Bioinformatics. 2019 Jan 1;35(1):20-27. doi: 10.1093/bioinformatics/bty544.

SVIM: structural variant identification using mapped long reads.SVIM：基于比对的长读段的结构变异识别。

Bioinformatics. 2019 Sep 1;35(17):2907-2915. doi: 10.1093/bioinformatics/btz041.

Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph.使用变阶 de Bruijn 图对高度嘈杂的长读进行混合纠错。

Bioinformatics. 2018 Dec 15;34(24):4213-4222. doi: 10.1093/bioinformatics/bty521.

TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data.TRiCoLOR：使用全基因组长读测序数据进行串联重复分析。

Gigascience. 2020 Oct 7;9(10). doi: 10.1093/gigascience/giaa101.

Finding long tandem repeats in long noisy reads.在长噪声读取中查找长串联重复。

Bioinformatics. 2021 May 5;37(5):612-621. doi: 10.1093/bioinformatics/btaa865.

PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores.PBSIM2：一种带有新型质量评分生成模型的长读测序模拟软件。

Bioinformatics. 2021 May 5;37(5):589-595. doi: 10.1093/bioinformatics/btaa835.

引用本文的文献

Intronic FGF14 GAA repeat expansions impact progression and survival in multiple system atrophy.内含子FGF14基因GAA重复序列扩增影响多系统萎缩的病情进展和生存。

Brain. 2025 Apr 16. doi: 10.1093/brain/awaf134.

Navigating triplet repeats sequencing: concepts, methodological challenges and perspective for Huntington's disease.解读三联体重复序列测序：概念、方法学挑战及亨廷顿舞蹈症研究前景

Nucleic Acids Res. 2025 Jan 7;53(1). doi: 10.1093/nar/gkae1155.

High resolution long-read telomere sequencing reveals dynamic mechanisms in aging and cancer.高分辨率长读端粒测序揭示衰老和癌症中的动态机制。

Nat Commun. 2024 Jun 18;15(1):5149. doi: 10.1038/s41467-024-48917-7.

Evolution of ancient satellite DNAs in extant alligators and caimans (Crocodylia, Reptilia).现存短吻鳄和凯门鳄（鳄目，爬行纲）中古老卫星 DNA 的进化。

BMC Biol. 2024 Feb 27;22(1):47. doi: 10.1186/s12915-024-01847-8.

Mdwgan-gp: data augmentation for gene expression data based on multiple discriminator WGAN-GP.MdWgan-gp：基于多个鉴别器 WGAN-GP 的基因表达数据的数据增强。

BMC Bioinformatics. 2023 Nov 13;24(1):427. doi: 10.1186/s12859-023-05558-9.

A comparison of Oxford nanopore library strategies for bacterial genomics.牛津纳米孔文库策略在细菌基因组学中的比较。

BMC Genomics. 2023 Oct 20;24(1):627. doi: 10.1186/s12864-023-09729-z.

Chromosome level genome assembly of oriental armyworm Mythimna separata.东方粘虫 Mythimna separata 的染色体水平基因组组装

Sci Data. 2023 Sep 8;10(1):597. doi: 10.1038/s41597-023-02506-3.

NanoSTR: A method for detection of target short tandem repeats based on nanopore sequencing data.纳米STR：一种基于纳米孔测序数据检测目标短串联重复序列的方法。

Front Mol Biosci. 2023 Jan 18;10:1093519. doi: 10.3389/fmolb.2023.1093519. eCollection 2023.

Familial Cerebellar Ataxia and Amyotrophic Lateral Sclerosis/Frontotemporal Dementia with DAB1 and C9ORF72 Repeat Expansions: An 18-Year Study.家族性小脑共济失调和肌萎缩侧索硬化症/额颞叶痴呆伴 DAB1 和 C9ORF72 重复扩展：一项 18 年的研究。

Mov Disord. 2022 Dec;37(12):2427-2439. doi: 10.1002/mds.29221. Epub 2022 Sep 23.

A Pipeline NanoTRF as a New Tool for Satellite DNA Identification in the Raw Nanopore Sequencing Reads of Plant Genomes.一种管道式纳米TRF作为在植物基因组原始纳米孔测序读数中鉴定卫星DNA的新工具。

Plants (Basel). 2022 Aug 12;11(16):2103. doi: 10.3390/plants11162103.

本文引用的文献

High Satellite Repeat Turnover in Great Apes Studied with Short- and Long-Read Technologies.利用短读长和长读长技术研究大猩猩中的高卫星重复序列周转情况。

Mol Biol Evol. 2019 Nov 1;36(11):2415-2431. doi: 10.1093/molbev/msz156.

How complete are "complete" genome assemblies?-An avian perspective.“完整”基因组组装的完整性如何？——鸟类视角。

Mol Ecol Resour. 2018 Nov;18(6):1188-1195. doi: 10.1111/1755-0998.12933. Epub 2018 Aug 16.

Minimap2: pairwise alignment for nucleotide sequences.Minimap2：核苷酸序列的两两比对。

Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.

Satellite DNA evolution: old ideas, new approaches.卫星 DNA 进化：旧观念，新方法。

Curr Opin Genet Dev. 2018 Apr;49:70-78. doi: 10.1016/j.gde.2018.03.003. Epub 2018 Mar 23.

Nanopore sequencing and assembly of a human genome with ultra-long reads.纳米孔测序和超长读长组装人类基因组。

Nat Biotechnol. 2018 Apr;36(4):338-345. doi: 10.1038/nbt.4060. Epub 2018 Jan 29.

Human satellite-III non-coding RNAs modulate heat-shock-induced transcriptional repression.人类卫星III非编码RNA调节热休克诱导的转录抑制。

J Cell Sci. 2016 Oct 1;129(19):3541-3552. doi: 10.1242/jcs.189803. Epub 2016 Aug 15.

Extensive sequencing of seven human genomes to characterize benchmark reference materials.对七个人类基因组进行广泛测序以表征基准参考材料。

Sci Data. 2016 Jun 7;3:160025. doi: 10.1038/sdata.2016.25.

Aging stem cells. A Werner syndrome stem cell model unveils heterochromatin alterations as a driver of human aging.衰老干细胞。一种沃纳综合征干细胞模型揭示了异染色质改变是人类衰老的驱动因素。

Science. 2015 Jun 5;348(6239):1160-3. doi: 10.1126/science.aaa1356. Epub 2015 Apr 30.

Genomic characterization of large heterochromatic gaps in the human genome assembly.人类基因组组装中大型异染色质间隙的基因组特征分析。

PLoS Comput Biol. 2014 May 15;10(5):e1003628. doi: 10.1371/journal.pcbi.1003628. eCollection 2014 May.

Satellite DNAs between selfishness and functionality: structure, genomics and evolution of tandem repeats in centromeric (hetero)chromatin.自私性与功能性之间的卫星DNA：着丝粒（异）染色质中串联重复序列的结构、基因组学及进化

Gene. 2008 Feb 15;409(1-2):72-82. doi: 10.1016/j.gene.2007.11.013. Epub 2007 Dec 4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

降噪重复序列发现工具：在易错长读测序数据中发现串联重复序列。

Noise-cancelling repeat finder: uncovering tandem repeats in error-prone long-read sequencing data.

机构信息

出版信息

SUMMARY

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

摘要

可用性和实施

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献