Suppr超能文献

重复 DNA 和新一代测序:计算挑战与解决方案。

Repetitive DNA and next-generation sequencing: computational challenges and solutions.

机构信息

McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA.

出版信息

Nat Rev Genet. 2011 Nov 29;13(1):36-46. doi: 10.1038/nrg3117.

Abstract

Repetitive DNA sequences are abundant in a broad range of species, from bacteria to mammals, and they cover nearly half of the human genome. Repeats have always presented technical challenges for sequence alignment and assembly programs. Next-generation sequencing projects, with their short read lengths and high data volumes, have made these challenges more difficult. From a computational perspective, repeats create ambiguities in alignment and assembly, which, in turn, can produce biases and errors when interpreting results. Simply ignoring repeats is not an option, as this creates problems of its own and may mean that important biological phenomena are missed. We discuss the computational problems surrounding repeats and describe strategies used by current bioinformatics systems to solve them.

摘要

重复 DNA 序列在从细菌到哺乳动物的广泛物种中都很丰富,它们覆盖了人类基因组的近一半。重复序列一直是序列比对和组装程序的技术挑战。具有短读长和大数据量的新一代测序项目使这些挑战更加困难。从计算的角度来看,重复序列在比对和组装中造成了不确定性,而这反过来又会在解释结果时产生偏差和错误。简单地忽略重复序列不是一个可行的选择,因为这会产生自身的问题,并且可能意味着重要的生物学现象被遗漏。我们讨论了围绕重复序列的计算问题,并描述了当前生物信息学系统用来解决这些问题的策略。

相似文献

1
Repetitive DNA and next-generation sequencing: computational challenges and solutions.
Nat Rev Genet. 2011 Nov 29;13(1):36-46. doi: 10.1038/nrg3117.
2
A sensitive repeat identification framework based on short and long reads.
Nucleic Acids Res. 2021 Sep 27;49(17):e100. doi: 10.1093/nar/gkab563.
3
Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions, DNPs.
Bioinformatics. 2002 Mar;18(3):379-88. doi: 10.1093/bioinformatics/18.3.379.
4
Computational and bioinformatics frameworks for next-generation whole exome and genome sequencing.
ScientificWorldJournal. 2013;2013:730210. doi: 10.1155/2013/730210. Epub 2013 Jan 13.
5
Alignment of Next-Generation Sequencing Reads.
Annu Rev Genomics Hum Genet. 2015;16:133-51. doi: 10.1146/annurev-genom-090413-025358. Epub 2015 May 4.
6
ReRep: computational detection of repetitive sequences in genome survey sequences (GSS).
BMC Bioinformatics. 2008 Sep 9;9:366. doi: 10.1186/1471-2105-9-366.
7
De novo repeat classification and fragment assembly.
Genome Res. 2004 Sep;14(9):1786-96. doi: 10.1101/gr.2395204.
8
Correcting base-assignment errors in repeat regions of shotgun assembly.
IEEE/ACM Trans Comput Biol Bioinform. 2007 Jan-Mar;4(1):54-64. doi: 10.1109/TCBB.2007.1005.
10
Multiple alignment of DNA sequences with MAFFT.
Methods Mol Biol. 2009;537:39-64. doi: 10.1007/978-1-59745-251-9_3.

引用本文的文献

1
Non-CG DNA methylation in animal genomes.
Nat Genet. 2025 Sep 11. doi: 10.1038/s41588-025-02303-1.
2
TCR germline diversity reveals evidence of natural selection on variable and joining alpha chain genes.
bioRxiv. 2025 Aug 24:2025.08.20.671277. doi: 10.1101/2025.08.20.671277.
3
BioFuse: A programmable timer switch of gene expression.
Sci Adv. 2025 Aug 29;11(35):eadv7892. doi: 10.1126/sciadv.adv7892. Epub 2025 Aug 27.
4
Hairpin loop to hairpin loop: a full-length assembly of the ASFV genome using Oxford Nanopore long-read sequencing.
Front Microbiol. 2025 Aug 8;16:1615977. doi: 10.3389/fmicb.2025.1615977. eCollection 2025.
5
High-fidelity long-read sequencing of an avian herpesvirus reveals extensive intrapopulation diversity in tandem repeat regions.
PLoS Pathog. 2025 Aug 25;21(8):e1013435. doi: 10.1371/journal.ppat.1013435. eCollection 2025 Aug.
6
Mutations of short tandem repeats explain abundant trait heritability in Arabidopsis.
Genome Biol. 2025 Aug 12;26(1):242. doi: 10.1186/s13059-025-03720-5.
8
Analysis of metagenomic data.
Nat Rev Methods Primers. 2025;5. doi: 10.1038/s43586-024-00376-6. Epub 2025 Jan 23.
9
ASVBM: Structural variant benchmarking with local joint analysis for multiple callsets.
Comput Struct Biotechnol J. 2025 Jun 29;27:2851-2862. doi: 10.1016/j.csbj.2025.06.045. eCollection 2025.

本文引用的文献

1
TopHat-Fusion: an algorithm for discovery of novel fusion transcripts.
Genome Biol. 2011 Aug 11;12(8):R72. doi: 10.1186/gb-2011-12-8-r72.
3
Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.
PLoS Comput Biol. 2011 Jul;7(7):e1002111. doi: 10.1371/journal.pcbi.1002111. Epub 2011 Jul 14.
4
Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM).
Bioinformatics. 2011 Sep 15;27(18):2518-28. doi: 10.1093/bioinformatics/btr427. Epub 2011 Jul 19.
5
Genome sequence and analysis of the tuber crop potato.
Nature. 2011 Jul 10;475(7355):189-95. doi: 10.1038/nature10158.
6
Demographic history and rare allele sharing among human populations.
Proc Natl Acad Sci U S A. 2011 Jul 19;108(29):11983-8. doi: 10.1073/pnas.1019276108. Epub 2011 Jul 5.
7
Identification of novel transcripts in annotated genomes using RNA-Seq.
Bioinformatics. 2011 Sep 1;27(17):2325-9. doi: 10.1093/bioinformatics/btr355. Epub 2011 Jun 21.
8
Sniper: improved SNP discovery by multiply mapping deep sequenced reads.
Genome Biol. 2011 Jun 20;12(6):R55. doi: 10.1186/gb-2011-12-6-r55.
9
Computational methods for transcriptome annotation and quantification using RNA-seq.
Nat Methods. 2011 Jun;8(6):469-77. doi: 10.1038/nmeth.1613. Epub 2011 May 27.
10
rnaSeqMap: a Bioconductor package for RNA sequencing data exploration.
BMC Bioinformatics. 2011 May 25;12:200. doi: 10.1186/1471-2105-12-200.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验