• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

优化下一代测序 (NGS) 读取中的信息,以提高从头基因组组装的质量。

Optimizing information in Next-Generation-Sequencing (NGS) reads for improving de novo genome assembly.

机构信息

Institute of Bioinformatics and Biosignal Transduction, National Cheng Kung University, Tainan, Taiwan.

出版信息

PLoS One. 2013 Jul 29;8(7):e69503. doi: 10.1371/journal.pone.0069503. Print 2013.

DOI:10.1371/journal.pone.0069503
PMID:23922726
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3726674/
Abstract

Next-Generation-Sequencing is advantageous because of its much higher data throughput and much lower cost compared with the traditional Sanger method. However, NGS reads are shorter than Sanger reads, making de novo genome assembly very challenging. Because genome assembly is essential for all downstream biological studies, great efforts have been made to enhance the completeness of genome assembly, which requires the presence of long reads or long distance information. To improve de novo genome assembly, we develop a computational program, ARF-PE, to increase the length of Illumina reads. ARF-PE takes as input Illumina paired-end (PE) reads and recovers the original DNA fragments from which two ends the paired reads are obtained. On the PE data of four bacteria, ARF-PE recovered >87% of the DNA fragments and achieved >98% of perfect DNA fragment recovery. Using Velvet, SOAPdenovo, Newbler, and CABOG, we evaluated the benefits of recovered DNA fragments to genome assembly. For all four bacteria, the recovered DNA fragments increased the assembly contiguity. For example, the N50 lengths of the P. brasiliensis contigs assembled by SOAPdenovo and Newbler increased from 80,524 bp to 166,573 bp and from 80,655 bp to 193,388 bp, respectively. ARF-PE also increased assembly accuracy in many cases. On the PE data of two fungi and a human chromosome, ARF-PE doubled and tripled the N50 length. However, the assembly accuracies dropped, but still remained >91%. In general, ARF-PE can increase both assembly contiguity and accuracy for bacterial genomes. For complex eukaryotic genomes, ARF-PE is promising because it raises assembly contiguity. But future error correction is needed for ARF-PE to also increase the assembly accuracy. ARF-PE is freely available at http://140.116.235.124/~tliu/arf-pe/.

摘要

下一代测序(Next-Generation-Sequencing)具有更高的数据通量和更低的成本优势,与传统的 Sanger 方法相比。然而,NGS 读取比 Sanger 读取短,使得从头基因组组装极具挑战性。因为基因组组装是所有下游生物学研究的基础,所以人们付出了巨大的努力来提高基因组组装的完整性,这需要长读长或长距离信息的存在。为了提高从头基因组组装的质量,我们开发了一个计算程序 ARF-PE,用于增加 Illumina 读取的长度。ARF-PE 以 Illumina 配对末端(PE)读取作为输入,并从获得配对读取的两个末端的原始 DNA 片段中恢复。在四个细菌的 PE 数据上,ARF-PE 恢复了超过 87%的 DNA 片段,实现了超过 98%的完美 DNA 片段恢复。使用 Velvet、SOAPdenovo、Newbler 和 CABOG,我们评估了恢复的 DNA 片段对基因组组装的好处。对于所有四个细菌,恢复的 DNA 片段都增加了组装的连续性。例如,使用 SOAPdenovo 和 Newbler 组装的 P. brasiliensis 基因组 contigs 的 N50 长度分别从 80,524 bp 增加到 166,573 bp 和从 80,655 bp 增加到 193,388 bp。ARF-PE 在许多情况下也提高了组装的准确性。在两个真菌和一个人类染色体的 PE 数据上,ARF-PE 将 N50 长度增加了两倍和三倍。然而,组装的准确性有所下降,但仍保持在 91%以上。总的来说,ARF-PE 可以提高细菌基因组的组装连续性和准确性。对于复杂的真核生物基因组,ARF-PE 很有前途,因为它提高了组装的连续性。但是,未来需要进行错误纠正,以使 ARF-PE 也能提高组装的准确性。ARF-PE 可在 http://140.116.235.124/~tliu/arf-pe/ 免费获取。

相似文献

1
Optimizing information in Next-Generation-Sequencing (NGS) reads for improving de novo genome assembly.优化下一代测序 (NGS) 读取中的信息,以提高从头基因组组装的质量。
PLoS One. 2013 Jul 29;8(7):e69503. doi: 10.1371/journal.pone.0069503. Print 2013.
2
Pseudo-Sanger sequencing: massively parallel production of long and near error-free reads using NGS technology.伪桑格测序:使用下一代测序(NGS)技术大规模并行产生长且近乎无错误的 reads。
BMC Genomics. 2013 Oct 17;14(1):711. doi: 10.1186/1471-2164-14-711.
3
A pilot study for channel catfish whole genome sequencing and de novo assembly.斑点叉尾鮰全基因组测序和从头组装的初步研究。
BMC Genomics. 2011 Dec 22;12:629. doi: 10.1186/1471-2164-12-629.
4
Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing.整体是否大于部分之和?基于双末端测序的细菌基因组从头组装策略。
BMC Genomics. 2015 Aug 28;16(1):648. doi: 10.1186/s12864-015-1859-8.
5
Paired-end sequencing of long-range DNA fragments for de novo assembly of large, complex Mammalian genomes by direct intra-molecule ligation.长距离 DNA 片段的配对末端测序通过直接分子内连接从头组装大型复杂哺乳动物基因组。
PLoS One. 2012;7(9):e46211. doi: 10.1371/journal.pone.0046211. Epub 2012 Sep 27.
6
Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.利用下一代测序数据鉴定最佳测序深度,特别是对于从头组装小基因组的应用。
PLoS One. 2013 Apr 12;8(4):e60204. doi: 10.1371/journal.pone.0060204. Print 2013.
7
HGA: de novo genome assembly method for bacterial genomes using high coverage short sequencing reads.HGA:一种利用高覆盖度短测序读段进行细菌基因组从头组装的方法。
BMC Genomics. 2016 Mar 5;17:193. doi: 10.1186/s12864-016-2515-7.
8
Effects of GC bias in next-generation-sequencing data on de novo genome assembly.下一代测序数据中的 GC 偏倚对从头基因组组装的影响。
PLoS One. 2013 Apr 29;8(4):e62856. doi: 10.1371/journal.pone.0062856. Print 2013.
9
LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly.LR_Gapcloser:一种基于平铺路径的缺口闭合器,它使用长读长来完成基因组组装。
Gigascience. 2019 Jan 1;8(1):giy157. doi: 10.1093/gigascience/giy157.
10
B-assembler: a circular bacterial genome assembler.B-assembler:一种用于环形细菌基因组组装的工具。
BMC Genomics. 2022 May 11;23(Suppl 4):361. doi: 10.1186/s12864-022-08577-7.

引用本文的文献

1
Analysis of Software Read Cross-Contamination in DNBSEQ Data.DNBSEQ数据中软件读取交叉污染的分析
Biology (Basel). 2025 Jun 9;14(6):670. doi: 10.3390/biology14060670.
2
Complete Genome Sequence of the -Specific Bacteriophage BRock.-特异性噬菌体BRock的全基因组序列
Microbiol Resour Announc. 2020 Aug 27;9(35):e00624-20. doi: 10.1128/MRA.00624-20.
3
Optimal sequencing depth design for whole genome re-sequencing in pigs.猪全基因组重测序的最佳测序深度设计。

本文引用的文献

1
Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology.注意差距:使用 Pacific Biosciences RS 长读测序技术升级基因组。
PLoS One. 2012;7(11):e47768. doi: 10.1371/journal.pone.0047768. Epub 2012 Nov 21.
2
COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly.COPE:一种基于精确 k-mer 的双端 reads 连接工具,可方便基因组组装。
Bioinformatics. 2012 Nov 15;28(22):2870-4. doi: 10.1093/bioinformatics/bts563. Epub 2012 Oct 8.
3
SEQuel: improving the accuracy of genome assemblies.
BMC Bioinformatics. 2019 Nov 8;20(1):556. doi: 10.1186/s12859-019-3164-z.
4
Complete Genome Sequences of Bacteriophages Wes44 and Carmen17.噬菌体Wes44和Carmen17的全基因组序列
Microbiol Resour Announc. 2019 Mar 21;8(12):e01103-18. doi: 10.1128/MRA.01103-18.
5
Complete Genome Sequence of Bacillus Phage Belinda from Grand Cayman Island.来自大开曼岛的芽孢杆菌噬菌体贝琳达的全基因组序列。
Genome Announc. 2016 Oct 13;4(5):e00571-16. doi: 10.1128/genomeA.00571-16.
6
Complete Genome Sequence of Bacillus thuringiensis Bacteriophage Smudge.苏云金芽孢杆菌噬菌体Smudge的全基因组序列
Genome Announc. 2016 Aug 18;4(4):e00572-16. doi: 10.1128/genomeA.00572-16.
7
Complete Genome Sequence of Bacillus megaterium Bacteriophage Eldridge.巨大芽孢杆菌噬菌体埃尔德里奇的全基因组序列
Genome Announc. 2016 Apr 21;4(2):e01728-15. doi: 10.1128/genomeA.01728-15.
8
Complete genome sequence of a mosaic bacteriophage, waukesha92.嵌合噬菌体waukesha92的全基因组序列
Genome Announc. 2014 Aug 21;2(4):e00339-14. doi: 10.1128/genomeA.00339-14.
SEQuel:提高基因组组装的准确性。
Bioinformatics. 2012 Jun 15;28(12):i188-96. doi: 10.1093/bioinformatics/bts219.
4
pIRS: Profile-based Illumina pair-end reads simulator.pIRS:基于谱的 Illumina 双端读取模拟器。
Bioinformatics. 2012 Jun 1;28(11):1533-5. doi: 10.1093/bioinformatics/bts187. Epub 2012 Apr 15.
5
GAGE: A critical evaluation of genome assemblies and assembly algorithms.盖奇:基因组组装和算法的关键评估。
Genome Res. 2012 Mar;22(3):557-67. doi: 10.1101/gr.131383.111. Epub 2012 Jan 6.
6
Database resources of the National Center for Biotechnology Information.国家生物技术信息中心数据库资源。
Nucleic Acids Res. 2012 Jan;40(Database issue):D13-25. doi: 10.1093/nar/gkr1184. Epub 2011 Dec 2.
7
The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata.《基因组在线数据库》(GOLD)v.4:基因组和宏基因组项目及其相关元数据的现状。
Nucleic Acids Res. 2012 Jan;40(Database issue):D571-9. doi: 10.1093/nar/gkr1100. Epub 2011 Dec 1.
8
Repetitive DNA and next-generation sequencing: computational challenges and solutions.重复 DNA 和新一代测序:计算挑战与解决方案。
Nat Rev Genet. 2011 Nov 29;13(1):36-46. doi: 10.1038/nrg3117.
9
FLASH: fast length adjustment of short reads to improve genome assemblies.FLASH:快速调整短读长以提高基因组组装质量。
Bioinformatics. 2011 Nov 1;27(21):2957-63. doi: 10.1093/bioinformatics/btr507. Epub 2011 Sep 7.
10
Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries.分析并最小化 Illumina 测序文库中的 PCR 扩增偏倚。
Genome Biol. 2011;12(2):R18. doi: 10.1186/gb-2011-12-2-r18. Epub 2011 Feb 21.