• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估使用 Mate-Pairs 解决从头组装的短读 prokaryotic 重复的好处。

Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies.

机构信息

Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA.

出版信息

BMC Bioinformatics. 2011 Apr 13;12:95. doi: 10.1186/1471-2105-12-95.

DOI:10.1186/1471-2105-12-95
PMID:21486487
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3103447/
Abstract

BACKGROUND

Next-generation sequencing technologies allow genomes to be sequenced more quickly and less expensively than ever before. However, as sequencing technology has improved, the difficulty of de novo genome assembly has increased, due in large part to the shorter reads generated by the new technologies. The use of mated sequences (referred to as mate-pairs) is a standard means of disambiguating assemblies to obtain a more complete picture of the genome without resorting to manual finishing. Here, we examine the effectiveness of mate-pair information in resolving repeated sequences in the DNA (a paramount issue to overcome). While it has been empirically accepted that mate-pairs improve assemblies, and a variety of assemblers use mate-pairs in the context of repeat resolution, the effectiveness of mate-pairs in this context has not been systematically evaluated in previous literature.

RESULTS

We show that, in high-coverage prokaryotic assemblies, libraries of short mate-pairs (about 4-6 times the read-length) more effectively disambiguate repeat regions than the libraries that are commonly constructed in current genome projects. We also demonstrate that the best assemblies can be obtained by 'tuning' mate-pair libraries to accommodate the specific repeat structure of the genome being assembled - information that can be obtained through an initial assembly using unpaired reads. These results are shown across 360 simulations on 'ideal' prokaryotic data as well as assembly of 8 bacterial genomes using SOAPdenovo. The simulation results provide an upper-bound on the potential value of mate-pairs for resolving repeated sequences in real prokaryotic data sets. The assembly results show that our method of tuning mate-pairs exploits fundamental properties of these genomes, leading to better assemblies even when using an off -the-shelf assembler in the presence of base-call errors.

CONCLUSIONS

Our results demonstrate that dramatic improvements in prokaryotic genome assembly quality can be achieved by tuning mate-pair sizes to the actual repeat structure of a genome, suggesting the possible need to change the way sequencing projects are designed. We propose that a two-tiered approach - first generate an assembly of the genome with unpaired reads in order to evaluate the repeat structure of the genome; then generate the mate-pair libraries that provide most information towards the resolution of repeats in the genome being assembled - is not only possible, but likely also more cost-effective as it will significantly reduce downstream manual finishing costs. In future work we intend to address the question of whether this result can be extended to larger eukaryotic genomes, where repeat structure can be quite different.

摘要

背景

下一代测序技术使得基因组的测序比以往任何时候都更快、更便宜。然而,随着测序技术的改进,从头组装基因组的难度也增加了,这在很大程度上是由于新技术产生的较短读长。使用配对序列(称为 mate-pairs)是一种标准的方法,可以通过不依赖于手动完成来区分组装,从而获得更完整的基因组图谱。在这里,我们研究了 mate-pairs 信息在解决 DNA 中的重复序列(一个需要克服的主要问题)方面的有效性。虽然已经从经验上接受了 mate-pairs 可以改善组装,并且各种组装器在重复分辨率的上下文中使用 mate-pairs,但在以前的文献中,没有系统地评估 mate-pairs 在这种情况下的有效性。

结果

我们表明,在高覆盖率的原核组装中,短 mate-pairs 文库(约为读长的 4-6 倍)比当前基因组项目中构建的文库更有效地解决重复区域的歧义。我们还证明,通过“调整”mate-pairs 文库以适应正在组装的基因组的特定重复结构,可以获得最佳的组装-可以通过使用未配对的读取进行初始组装来获得该信息。这些结果是在对 360 个“理想”原核数据的模拟以及使用 SOAPdenovo 对 8 个细菌基因组进行组装的基础上得到的。模拟结果提供了 mate-pairs 在解决真实原核数据集重复序列方面的潜在价值的上限。组装结果表明,我们调整 mate-pairs 的方法利用了这些基因组的基本特性,即使在存在碱基调用错误的情况下,使用现成的组装器也可以获得更好的组装。

结论

我们的结果表明,通过将 mate-pairs 的大小调整到基因组的实际重复结构,可以显著提高原核基因组组装的质量,这表明可能需要改变测序项目的设计方式。我们提出,一种两级方法-首先使用未配对的读取生成基因组的组装,以评估基因组的重复结构;然后生成提供有关组装基因组中重复分辨率的最信息的 mate-pairs 文库-不仅是可能的,而且可能更具成本效益,因为它将大大降低下游手动完成的成本。在未来的工作中,我们打算解决这个问题,即这个结果是否可以扩展到重复结构可能大不相同的更大的真核基因组。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab8e/3103447/e6983fa1ceb0/1471-2105-12-95-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab8e/3103447/67d1c00b3108/1471-2105-12-95-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab8e/3103447/f9c161a0a6f4/1471-2105-12-95-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab8e/3103447/fc2d4a576976/1471-2105-12-95-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab8e/3103447/b965b0ee5e9b/1471-2105-12-95-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab8e/3103447/e6983fa1ceb0/1471-2105-12-95-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab8e/3103447/67d1c00b3108/1471-2105-12-95-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab8e/3103447/f9c161a0a6f4/1471-2105-12-95-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab8e/3103447/fc2d4a576976/1471-2105-12-95-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab8e/3103447/b965b0ee5e9b/1471-2105-12-95-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab8e/3103447/e6983fa1ceb0/1471-2105-12-95-5.jpg

相似文献

1
Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies.评估使用 Mate-Pairs 解决从头组装的短读 prokaryotic 重复的好处。
BMC Bioinformatics. 2011 Apr 13;12:95. doi: 10.1186/1471-2105-12-95.
2
SMRT sequencing only de novo assembly of the sugar beet (Beta vulgaris) chloroplast genome.甜菜(Beta vulgaris)叶绿体基因组的单分子实时测序从头组装
BMC Bioinformatics. 2015 Sep 16;16(1):295. doi: 10.1186/s12859-015-0726-6.
3
Evaluating long-read de novo assembly tools for eukaryotic genomes: insights and considerations.评估真核生物基因组的长读长从头组装工具:见解与考虑。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad100. Epub 2023 Nov 24.
4
Assembly of chloroplast genomes with long- and short-read data: a comparison of approaches using Eucalyptus pauciflora as a test case.利用长读长和短读数据组装叶绿体基因组:以白千层作为测试案例的方法比较。
BMC Genomics. 2018 Dec 29;19(1):977. doi: 10.1186/s12864-018-5348-8.
5
Paired de bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers.配对德布鲁因图:一种将配对末端信息整合到基因组组装工具中的新方法。
J Comput Biol. 2011 Nov;18(11):1625-34. doi: 10.1089/cmb.2011.0151. Epub 2011 Oct 14.
6
SOPRA: Scaffolding algorithm for paired reads via statistical optimization.SOPRA:基于统计优化的配对读取支架算法。
BMC Bioinformatics. 2010 Jun 24;11:345. doi: 10.1186/1471-2105-11-345.
7
Assembling short reads from jumping libraries with large insert sizes.利用带有较大插入片段的跳跃文库进行短读序列组装。
Bioinformatics. 2015 Oct 15;31(20):3262-8. doi: 10.1093/bioinformatics/btv337. Epub 2015 Jun 3.
8
B-assembler: a circular bacterial genome assembler.B-assembler:一种用于环形细菌基因组组装的工具。
BMC Genomics. 2022 May 11;23(Suppl 4):361. doi: 10.1186/s12864-022-08577-7.
9
Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations.病毒宏基因组组装中的碎片化和覆盖度变化,及其对多样性计算的影响。
Front Bioeng Biotechnol. 2015 Sep 17;3:141. doi: 10.3389/fbioe.2015.00141. eCollection 2015.
10
GABenchToB: a genome assembly benchmark tuned on bacteria and benchtop sequencers.GABenchToB:一个针对细菌和台式测序仪进行优化的基因组组装基准测试。
PLoS One. 2014 Sep 8;9(9):e107014. doi: 10.1371/journal.pone.0107014. eCollection 2014.

引用本文的文献

1
Assembling the perfect bacterial genome using Oxford Nanopore and Illumina sequencing.利用牛津纳米孔测序和Illumina测序组装完美的细菌基因组。
PLoS Comput Biol. 2023 Mar 2;19(3):e1010905. doi: 10.1371/journal.pcbi.1010905. eCollection 2023 Mar.
2
Optimization of the "" mate-pair method improves contiguity and accuracy of genome assembly.“配对末端”方法的优化提高了基因组组装的连续性和准确性。
Ecol Evol. 2023 Jan 11;13(1):e9745. doi: 10.1002/ece3.9745. eCollection 2023 Jan.
3
The Complete Genome Sequence and Structure of the Oleaginous Strain PD630 Through Nanopore Technology.

本文引用的文献

1
AN EFFICIENT ALGORITHM FOR CHINESE POSTMAN WALK ON BI-DIRECTED DE BRUIJN GRAPHS.一种在双向德布鲁因图上的中国邮路问题的高效算法。
Discrete Math Algorithms Appl. 2010;1:184-196. doi: 10.1007/978-3-642-17458-2_16.
2
Complete genome sequence of Thermocrinis albus type strain (HI 11/12).嗜热栖热放线菌模式菌株(HI 11/12)的全基因组序列
Stand Genomic Sci. 2010 Mar 30;2(2):194-202. doi: 10.4056/sigs.761490.
3
Complete genome sequence of Olsenella uli type strain (VPI D76D-27C).尤氏奥尔森菌模式菌株(VPI D76D-27C)的全基因组序列
通过纳米孔技术解析产油菌株PD630的全基因组序列与结构
Front Bioeng Biotechnol. 2022 Feb 17;9:810571. doi: 10.3389/fbioe.2021.810571. eCollection 2021.
4
Sequencing and Reconstructing Helminth Mitochondrial Genomes Directly from Genomic Next-Generation Sequencing Data.直接从基因组下一代测序数据中测序和重建寄生虫线粒体基因组。
Methods Mol Biol. 2021;2369:27-40. doi: 10.1007/978-1-0716-1681-9_3.
5
Refinement of Draft Genome Assemblies of Pigeonpea ().木豆基因组草图组装的优化()。 (原文括号部分内容缺失完整信息)
Front Genet. 2020 Dec 15;11:607432. doi: 10.3389/fgene.2020.607432. eCollection 2020.
6
Taro Genome Assembly and Linkage Map Reveal QTLs for Resistance to Taro Leaf Blight.芋头基因组组装与连锁图谱揭示芋头叶斑病抗性的数量性状位点
G3 (Bethesda). 2020 Aug 5;10(8):2763-2775. doi: 10.1534/g3.120.401367.
7
Modern technologies and algorithms for scaffolding assembled genomes.组装基因组的现代技术和算法。
PLoS Comput Biol. 2019 Jun 5;15(6):e1006994. doi: 10.1371/journal.pcbi.1006994. eCollection 2019 Jun.
8
Improving draft genome contiguity with reference-derived in silico mate-pair libraries.利用参考序列衍生的虚拟同型配对文库提高基因组草图连续性。
Gigascience. 2018 May 1;7(5). doi: 10.1093/gigascience/giy029.
9
Lost in plasmids: next generation sequencing and the complex genome of the tick-borne pathogen Borrelia burgdorferi.迷失在质粒中:新一代测序与蜱传病原体伯氏疏螺旋体的复杂基因组
BMC Genomics. 2017 May 30;18(1):422. doi: 10.1186/s12864-017-3804-5.
10
Single-Molecule Sequencing of the Genome.基因组的单分子测序
G3 (Bethesda). 2017 Mar 10;7(3):781-788. doi: 10.1534/g3.116.037598.
Stand Genomic Sci. 2010 Aug 20;3(1):76-84. doi: 10.4056/sigs.1082860.
4
Limitations of next-generation genome sequence assembly.下一代基因组序列组装的局限性。
Nat Methods. 2011 Jan;8(1):61-5. doi: 10.1038/nmeth.1527. Epub 2010 Nov 21.
5
Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance.设计深度测序实验:检测结构变异和估计转录本丰度。
BMC Genomics. 2010 Jun 18;11:385. doi: 10.1186/1471-2164-11-385.
6
Assembly complexity of prokaryotic genomes using short reads.使用短读长组装原核基因组的复杂性。
BMC Bioinformatics. 2010 Jan 12;11:21. doi: 10.1186/1471-2105-11-21.
7
Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler.卵石和摇滚乐队:绒毛短读从头组装中的重复和支架的启发式解析。
PLoS One. 2009 Dec 22;4(12):e8407. doi: 10.1371/journal.pone.0008407.
8
De novo assembly of human genomes with massively parallel short read sequencing.利用大规模平行短读测序进行人类基因组从头组装。
Genome Res. 2010 Feb;20(2):265-72. doi: 10.1101/gr.097261.109. Epub 2009 Dec 17.
9
Maximum likelihood genome assembly.最大似然基因组组装
J Comput Biol. 2009 Aug;16(8):1101-16. doi: 10.1089/cmb.2009.0047.
10
Genome assembly reborn: recent computational challenges.基因组组装重生:近期的计算挑战
Brief Bioinform. 2009 Jul;10(4):354-66. doi: 10.1093/bib/bbp026. Epub 2009 May 29.