• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

重复 DNA 和新一代测序:计算挑战与解决方案。

Repetitive DNA and next-generation sequencing: computational challenges and solutions.

机构信息

McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA.

出版信息

Nat Rev Genet. 2011 Nov 29;13(1):36-46. doi: 10.1038/nrg3117.

DOI:10.1038/nrg3117
PMID:22124482
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3324860/
Abstract

Repetitive DNA sequences are abundant in a broad range of species, from bacteria to mammals, and they cover nearly half of the human genome. Repeats have always presented technical challenges for sequence alignment and assembly programs. Next-generation sequencing projects, with their short read lengths and high data volumes, have made these challenges more difficult. From a computational perspective, repeats create ambiguities in alignment and assembly, which, in turn, can produce biases and errors when interpreting results. Simply ignoring repeats is not an option, as this creates problems of its own and may mean that important biological phenomena are missed. We discuss the computational problems surrounding repeats and describe strategies used by current bioinformatics systems to solve them.

摘要

重复 DNA 序列在从细菌到哺乳动物的广泛物种中都很丰富,它们覆盖了人类基因组的近一半。重复序列一直是序列比对和组装程序的技术挑战。具有短读长和大数据量的新一代测序项目使这些挑战更加困难。从计算的角度来看,重复序列在比对和组装中造成了不确定性,而这反过来又会在解释结果时产生偏差和错误。简单地忽略重复序列不是一个可行的选择,因为这会产生自身的问题,并且可能意味着重要的生物学现象被遗漏。我们讨论了围绕重复序列的计算问题,并描述了当前生物信息学系统用来解决这些问题的策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6264/3324860/d47137a4c936/nihms366744f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6264/3324860/7fb4f231354a/nihms366744f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6264/3324860/caa080161e3c/nihms366744f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6264/3324860/313bce58d0ac/nihms366744f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6264/3324860/1aab4d73eb74/nihms366744f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6264/3324860/d47137a4c936/nihms366744f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6264/3324860/7fb4f231354a/nihms366744f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6264/3324860/caa080161e3c/nihms366744f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6264/3324860/313bce58d0ac/nihms366744f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6264/3324860/1aab4d73eb74/nihms366744f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6264/3324860/d47137a4c936/nihms366744f5.jpg

相似文献

1
Repetitive DNA and next-generation sequencing: computational challenges and solutions.重复 DNA 和新一代测序:计算挑战与解决方案。
Nat Rev Genet. 2011 Nov 29;13(1):36-46. doi: 10.1038/nrg3117.
2
A sensitive repeat identification framework based on short and long reads.基于短读长读的敏感重复序列识别框架。
Nucleic Acids Res. 2021 Sep 27;49(17):e100. doi: 10.1093/nar/gkab563.
3
Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions, DNPs.使用定义的核苷酸位置(DNP)在鸟枪法组装中分离几乎相同的重复序列。
Bioinformatics. 2002 Mar;18(3):379-88. doi: 10.1093/bioinformatics/18.3.379.
4
Computational and bioinformatics frameworks for next-generation whole exome and genome sequencing.用于下一代全外显子组和基因组测序的计算与生物信息学框架。
ScientificWorldJournal. 2013;2013:730210. doi: 10.1155/2013/730210. Epub 2013 Jan 13.
5
Alignment of Next-Generation Sequencing Reads.下一代测序读数的比对
Annu Rev Genomics Hum Genet. 2015;16:133-51. doi: 10.1146/annurev-genom-090413-025358. Epub 2015 May 4.
6
ReRep: computational detection of repetitive sequences in genome survey sequences (GSS).ReRep:基因组调查序列(GSS)中重复序列的计算检测
BMC Bioinformatics. 2008 Sep 9;9:366. doi: 10.1186/1471-2105-9-366.
7
De novo repeat classification and fragment assembly.从头重复序列分类和片段组装。
Genome Res. 2004 Sep;14(9):1786-96. doi: 10.1101/gr.2395204.
8
Correcting base-assignment errors in repeat regions of shotgun assembly.校正鸟枪法测序组装重复区域中的碱基分配错误。
IEEE/ACM Trans Comput Biol Bioinform. 2007 Jan-Mar;4(1):54-64. doi: 10.1109/TCBB.2007.1005.
9
SeqHelp: a program to analyze molecular sequences utilizing common computational resources.SeqHelp:一个利用普通计算资源分析分子序列的程序。
Genome Res. 1998 Mar;8(3):306-12. doi: 10.1101/gr.8.3.306.
10
Multiple alignment of DNA sequences with MAFFT.使用MAFFT对DNA序列进行多重比对。
Methods Mol Biol. 2009;537:39-64. doi: 10.1007/978-1-59745-251-9_3.

引用本文的文献

1
Non-CG DNA methylation in animal genomes.动物基因组中的非CG DNA甲基化
Nat Genet. 2025 Sep 11. doi: 10.1038/s41588-025-02303-1.
2
TCR germline diversity reveals evidence of natural selection on variable and joining alpha chain genes.TCR种系多样性揭示了可变区和连接区α链基因存在自然选择的证据。
bioRxiv. 2025 Aug 24:2025.08.20.671277. doi: 10.1101/2025.08.20.671277.
3
BioFuse: A programmable timer switch of gene expression.生物融合:一种基因表达的可编程定时开关。

本文引用的文献

1
TopHat-Fusion: an algorithm for discovery of novel fusion transcripts.TopHat-Fusion:一种用于发现新型融合转录本的算法。
Genome Biol. 2011 Aug 11;12(8):R72. doi: 10.1186/gb-2011-12-8-r72.
2
Exome sequencing and analysis of induced pluripotent stem cells identify the cilia-related gene male germ cell-associated kinase (MAK) as a cause of retinitis pigmentosa.外显子组测序和诱导多能干细胞分析鉴定出纤毛相关基因精子发生相关激酶(MAK)是导致色素性视网膜炎的原因。
Proc Natl Acad Sci U S A. 2011 Aug 23;108(34):E569-76. doi: 10.1073/pnas.1108918108. Epub 2011 Aug 8.
3
Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.
Sci Adv. 2025 Aug 29;11(35):eadv7892. doi: 10.1126/sciadv.adv7892. Epub 2025 Aug 27.
4
Hairpin loop to hairpin loop: a full-length assembly of the ASFV genome using Oxford Nanopore long-read sequencing.发夹环到发夹环:利用牛津纳米孔长读长测序对非洲猪瘟病毒基因组进行全长组装
Front Microbiol. 2025 Aug 8;16:1615977. doi: 10.3389/fmicb.2025.1615977. eCollection 2025.
5
High-fidelity long-read sequencing of an avian herpesvirus reveals extensive intrapopulation diversity in tandem repeat regions.一种禽疱疹病毒的高保真长读长测序揭示了串联重复区域内广泛的群体内多样性。
PLoS Pathog. 2025 Aug 25;21(8):e1013435. doi: 10.1371/journal.ppat.1013435. eCollection 2025 Aug.
6
Mutations of short tandem repeats explain abundant trait heritability in Arabidopsis.短串联重复序列的突变解释了拟南芥中丰富的性状遗传力。
Genome Biol. 2025 Aug 12;26(1):242. doi: 10.1186/s13059-025-03720-5.
7
High-Resolution Core Gene-Associated Multiple Nucleotide Polymorphism (cgMNP) Markers for Strain Identification in the Wine Cap Mushroom .用于酒帽蘑菇菌株鉴定的高分辨率核心基因相关多核苷酸多态性(cgMNP)标记
Microorganisms. 2025 Jul 17;13(7):1685. doi: 10.3390/microorganisms13071685.
8
Analysis of metagenomic data.宏基因组数据的分析
Nat Rev Methods Primers. 2025;5. doi: 10.1038/s43586-024-00376-6. Epub 2025 Jan 23.
9
ASVBM: Structural variant benchmarking with local joint analysis for multiple callsets.ASVBM:通过对多个数据集进行局部联合分析的结构变异基准测试
Comput Struct Biotechnol J. 2025 Jun 29;27:2851-2862. doi: 10.1016/j.csbj.2025.06.045. eCollection 2025.
10
Direct long-read visualization reveals hidden variation in GCH1 gene copy number and precise expansion steps.直接长读长可视化揭示了GCH1基因拷贝数的隐藏变异和精确的扩增步骤。
BMC Genomics. 2025 Jul 17;26(1):671. doi: 10.1186/s12864-025-11859-5.
利用 ChIP-Seq 数据的多读分析技术,在基因组的高度重复区域中发现转录因子结合位点。
PLoS Comput Biol. 2011 Jul;7(7):e1002111. doi: 10.1371/journal.pcbi.1002111. Epub 2011 Jul 14.
4
Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM).RNA-Seq 比对算法与 RNA-Seq 统一映射器(RUM)的比较分析。
Bioinformatics. 2011 Sep 15;27(18):2518-28. doi: 10.1093/bioinformatics/btr427. Epub 2011 Jul 19.
5
Genome sequence and analysis of the tuber crop potato.马铃薯块茎作物的基因组序列与分析。
Nature. 2011 Jul 10;475(7355):189-95. doi: 10.1038/nature10158.
6
Demographic history and rare allele sharing among human populations.人口历史与人类群体中的罕见等位基因共享。
Proc Natl Acad Sci U S A. 2011 Jul 19;108(29):11983-8. doi: 10.1073/pnas.1019276108. Epub 2011 Jul 5.
7
Identification of novel transcripts in annotated genomes using RNA-Seq.利用 RNA-Seq 鉴定注释基因组中的新型转录本。
Bioinformatics. 2011 Sep 1;27(17):2325-9. doi: 10.1093/bioinformatics/btr355. Epub 2011 Jun 21.
8
Sniper: improved SNP discovery by multiply mapping deep sequenced reads.Sniper:通过多重映射深度测序reads 提高 SNP 发现。
Genome Biol. 2011 Jun 20;12(6):R55. doi: 10.1186/gb-2011-12-6-r55.
9
Computational methods for transcriptome annotation and quantification using RNA-seq.基于 RNA-seq 的转录组注释和定量的计算方法。
Nat Methods. 2011 Jun;8(6):469-77. doi: 10.1038/nmeth.1613. Epub 2011 May 27.
10
rnaSeqMap: a Bioconductor package for RNA sequencing data exploration.rnaSeqMap:一个用于 RNA 测序数据探索的 Bioconductor 包。
BMC Bioinformatics. 2011 May 25;12:200. doi: 10.1186/1471-2105-12-200.