• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用定义的核苷酸位置(DNP)在鸟枪法组装中分离几乎相同的重复序列。

Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions, DNPs.

作者信息

Tammi Martti T, Arner Erik, Britton Tom, Andersson Björn

机构信息

Department of Genetics and Pathology, Rudbeck Laboratory, Uppsala University, Uppsala, Sweden.

出版信息

Bioinformatics. 2002 Mar;18(3):379-88. doi: 10.1093/bioinformatics/18.3.379.

DOI:10.1093/bioinformatics/18.3.379
PMID:11934736
Abstract

An increasingly important problem in genome sequencing is the failure of the commonly used shotgun assembly programs to correctly assemble repetitive sequences. The assembly of non-repetitive regions or regions containing repeats considerably shorter than the average read length is in practice easy to solve, while longer repeats have been a difficult problem. We here present a statistical method to separate arbitrarily long, almost identical repeats, which makes it possible to correctly assemble complex repetitive sequence regions. The differences between repeat units may be as low as 1% and the sequencing error may be up to ten times higher. The method is based on the realization that a comparison of only a part of all overlapping sequences at a time in a data set does not generate enough information for a conclusive analysis. Our method uses optimal multi-alignments consisting of all the overlaps of each read. This makes it possible to determine defined nucleotide positions, DNPs, which constitute the differences between the repeat units. Differences between repeats are distinguished from sequencing errors using statistical methods, where the probabilities of obtaining certain combinations of candidate DNPs are calculated using the information from the multi-alignments. The use of DNPs and combinations of DNPs will allow for optimal and rapid assemblies of repeated regions. This method can solve repeats that differ in only two positions in a read length, which is the theoretical limit for repeat separation. We predict that this method will be highly useful in shotgun sequencing in the future.

摘要

在基因组测序中,一个日益重要的问题是常用的鸟枪法组装程序无法正确组装重复序列。实际上,非重复区域或包含比平均读长明显短的重复序列的区域的组装很容易解决,而较长的重复序列一直是个难题。我们在此提出一种统计方法,用于分离任意长的、几乎相同的重复序列,这使得正确组装复杂的重复序列区域成为可能。重复单元之间的差异可能低至1%,而测序错误可能高达其十倍。该方法基于这样一种认识:一次仅比较数据集中所有重叠序列的一部分,无法生成足够的信息进行确定性分析。我们的方法使用由每个读段的所有重叠组成的最优多序列比对。这使得确定构成重复单元之间差异的特定核苷酸位置(DNP)成为可能。使用统计方法将重复序列之间的差异与测序错误区分开来,其中利用多序列比对中的信息计算获得候选DNP特定组合的概率。使用DNP和DNP组合将实现重复区域的最优和快速组装。该方法能够解决在一个读长中仅在两个位置不同的重复序列,这是重复序列分离的理论极限。我们预测该方法在未来的鸟枪法测序中将非常有用。

相似文献

1
Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions, DNPs.使用定义的核苷酸位置(DNP)在鸟枪法组装中分离几乎相同的重复序列。
Bioinformatics. 2002 Mar;18(3):379-88. doi: 10.1093/bioinformatics/18.3.379.
2
Correcting errors in shotgun sequences.校正鸟枪法测序中的错误。
Nucleic Acids Res. 2003 Aug 1;31(15):4663-72. doi: 10.1093/nar/gkg653;.
3
Correcting base-assignment errors in repeat regions of shotgun assembly.校正鸟枪法测序组装重复区域中的碱基分配错误。
IEEE/ACM Trans Comput Biol Bioinform. 2007 Jan-Mar;4(1):54-64. doi: 10.1109/TCBB.2007.1005.
4
A sensitive repeat identification framework based on short and long reads.基于短读长读的敏感重复序列识别框架。
Nucleic Acids Res. 2021 Sep 27;49(17):e100. doi: 10.1093/nar/gkab563.
5
TRAP: Tandem Repeat Assembly Program produces improved shotgun assemblies of repetitive sequences.TRAP:串联重复序列组装程序可改进对重复序列的鸟枪法组装。
Comput Methods Programs Biomed. 2003 Jan;70(1):47-59. doi: 10.1016/s0169-2607(01)00194-8.
6
Repetitive DNA and next-generation sequencing: computational challenges and solutions.重复 DNA 和新一代测序:计算挑战与解决方案。
Nat Rev Genet. 2011 Nov 29;13(1):36-46. doi: 10.1038/nrg3117.
7
Comparative genomics study of inverted repeats in bacteria.细菌中反向重复序列的比较基因组学研究。
Bioinformatics. 2002 Jul;18(7):971-9. doi: 10.1093/bioinformatics/18.7.971.
8
De novo repeat classification and fragment assembly.从头重复序列分类和片段组装。
Genome Res. 2004 Sep;14(9):1786-96. doi: 10.1101/gr.2395204.
9
AMASS: a structured pattern matching approach to shotgun sequence assembly.AMASS:一种用于鸟枪法序列组装的结构化模式匹配方法。
J Comput Biol. 1999 Summer;6(2):163-86. doi: 10.1089/cmb.1999.6.163.
10
Polypolish: Short-read polishing of long-read bacterial genome assemblies.多聚波兰:长读细菌基因组组装的短读抛光。
PLoS Comput Biol. 2022 Jan 24;18(1):e1009802. doi: 10.1371/journal.pcbi.1009802. eCollection 2022 Jan.

引用本文的文献

1
Resolving repeat families with long reads.使用长读长解决重复家族问题。
BMC Bioinformatics. 2019 May 9;20(1):232. doi: 10.1186/s12859-019-2807-4.
2
Deep repeat resolution-the assembly of the Drosophila Histone Complex.深度重复分辨率-果蝇组蛋白复合物的组装。
Nucleic Acids Res. 2019 Feb 20;47(3):e18. doi: 10.1093/nar/gky1194.
3
Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction.去噪DNA深度测序数据——高通量测序错误及其校正
Brief Bioinform. 2016 Jan;17(1):154-79. doi: 10.1093/bib/bbv029. Epub 2015 May 29.
4
SeqEntropy: genome-wide assessment of repeats for short read sequencing.SeqEntropy:用于短读测序的重复基因组全面评估。
PLoS One. 2013;8(3):e59484. doi: 10.1371/journal.pone.0059484. Epub 2013 Mar 27.
5
SEQuel: improving the accuracy of genome assemblies.SEQuel:提高基因组组装的准确性。
Bioinformatics. 2012 Jun 15;28(12):i188-96. doi: 10.1093/bioinformatics/bts219.
6
Genome assembly reborn: recent computational challenges.基因组组装重生:近期的计算挑战
Brief Bioinform. 2009 Jul;10(4):354-66. doi: 10.1093/bib/bbp026. Epub 2009 May 29.
7
Viral population estimation using pyrosequencing.使用焦磷酸测序法进行病毒群体估计。
PLoS Comput Biol. 2008 May 9;4(4):e1000074. doi: 10.1371/journal.pcbi.1000074.
8
Genome assembly forensics: finding the elusive mis-assembly.基因组组装取证:寻找难以捉摸的错误组装
Genome Biol. 2008;9(3):R55. doi: 10.1186/gb-2008-9-3-r55. Epub 2008 Mar 14.
9
Database of Trypanosoma cruzi repeated genes: 20,000 additional gene variants.克氏锥虫重复基因数据库:另外20000个基因变体
BMC Genomics. 2007 Oct 26;8:391. doi: 10.1186/1471-2164-8-391.
10
DNPTrapper: an assembly editing tool for finishing and analysis of complex repeat regions.DNPTrapper:一种用于复杂重复区域完成和分析的组装编辑工具。
BMC Bioinformatics. 2006 Mar 20;7:155. doi: 10.1186/1471-2105-7-155.