• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GRASS:一种用于下一代测序组装的通用支架算法。

GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies.

机构信息

The Delft Bioinformatics Lab, Department of Mediamatics, Delft University of Technology, Mekelweg 4, Delft.

出版信息

Bioinformatics. 2012 Jun 1;28(11):1429-37. doi: 10.1093/bioinformatics/bts175. Epub 2012 Apr 6.

DOI:10.1093/bioinformatics/bts175
PMID:22492642
Abstract

MOTIVATION

The increasing availability of second-generation high-throughput sequencing (HTS) technologies has sparked a growing interest in de novo genome sequencing. This in turn has fueled the need for reliable means of obtaining high-quality draft genomes from short-read sequencing data. The millions of reads usually involved in HTS experiments are first assembled into longer fragments called contigs, which are then scaffolded, i.e. ordered and oriented using additional information, to produce even longer sequences called scaffolds. Most existing scaffolders of HTS genome assemblies are not suited for using information other than paired reads to perform scaffolding. They use this limited information to construct scaffolds, often preferring scaffold length over accuracy, when faced with the tradeoff.

RESULTS

We present GRASS (GeneRic ASsembly Scaffolder)-a novel algorithm for scaffolding second-generation sequencing assemblies capable of using diverse information sources. GRASS offers a mixed-integer programming formulation of the contig scaffolding problem, which combines contig order, distance and orientation in a single optimization objective. The resulting optimization problem is solved using an expectation-maximization procedure and an unconstrained binary quadratic programming approximation of the original problem. We compared GRASS with existing HTS scaffolders using Illumina paired reads of three bacterial genomes. Our algorithm constructs a comparable number of scaffolds, but makes fewer errors. This result is further improved when additional data, in the form of related genome sequences, are used.

AVAILABILITY

GRASS source code is freely available from http://code.google.com/p/tud-scaffolding/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

第二代高通量测序(HTS)技术的日益普及激发了人们对从头基因组测序的浓厚兴趣。这反过来又推动了人们对从短读测序数据中获得高质量草图基因组的可靠方法的需求。HTS 实验通常涉及数百万个读取,这些读取首先被组装成长度较长的片段,称为 contigs,然后使用其他信息进行支架构建,即排序和定向,以生成更长的序列,称为 scaffolds。大多数现有的 HTS 基因组组装支架构建器不适合使用除配对读取以外的信息来执行支架构建。当面临这种权衡时,它们使用这种有限的信息来构建支架,通常更倾向于支架长度而不是准确性。

结果

我们提出了 GRASS(通用组装支架构建器)——一种能够使用多种信息源的第二代测序组装支架构建的新算法。GRASS 提供了一个用于 contig 支架构建问题的混合整数规划公式,该公式将 contig 的顺序、距离和定向组合到一个单一的优化目标中。通过使用期望最大化过程和原始问题的无约束二进制二次规划近似来解决由此产生的优化问题。我们使用 Illumina 对三个细菌基因组的配对读取与现有的 HTS 支架构建器进行了比较。我们的算法构建了数量相当的支架,但错误较少。当使用其他形式的相关基因组序列等附加数据时,结果会进一步得到改善。

可用性

GRASS 源代码可从 http://code.google.com/p/tud-scaffolding/ 免费获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

1
GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies.GRASS:一种用于下一代测序组装的通用支架算法。
Bioinformatics. 2012 Jun 1;28(11):1429-37. doi: 10.1093/bioinformatics/bts175. Epub 2012 Apr 6.
2
SCARPA: scaffolding reads with practical algorithms.SCARPA:使用实用算法进行支架读取。
Bioinformatics. 2013 Feb 15;29(4):428-34. doi: 10.1093/bioinformatics/bts716. Epub 2012 Dec 29.
3
Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework.使用MapReduce框架进行从头基因组组装时对高深度下一代测序读数的子集选择。
BMC Genomics. 2015;16 Suppl 12(Suppl 12):S9. doi: 10.1186/1471-2164-16-S12-S9. Epub 2015 Dec 9.
4
Fast scaffolding with small independent mixed integer programs.快速搭建小型独立混合整数规划。
Bioinformatics. 2011 Dec 1;27(23):3259-65. doi: 10.1093/bioinformatics/btr562. Epub 2011 Oct 13.
5
ScaffMatch: scaffolding algorithm based on maximum weight matching.ScaffMatch:基于最大权重匹配的支架算法。
Bioinformatics. 2015 Aug 15;31(16):2632-8. doi: 10.1093/bioinformatics/btv211. Epub 2015 Apr 17.
6
Multi-CAR: a tool of contig scaffolding using multiple references.多连续片段比对组装工具(Multi-CAR):一种使用多个参考序列进行重叠群搭建的工具。
BMC Bioinformatics. 2016 Dec 23;17(Suppl 17):469. doi: 10.1186/s12859-016-1328-7.
7
SCOP: a novel scaffolding algorithm based on contig classification and optimization.SCOP:一种基于重叠群分类和优化的新型支架算法。
Bioinformatics. 2019 Apr 1;35(7):1142-1150. doi: 10.1093/bioinformatics/bty773.
8
SOPRA: Scaffolding algorithm for paired reads via statistical optimization.SOPRA:基于统计优化的配对读取支架算法。
BMC Bioinformatics. 2010 Jun 24;11:345. doi: 10.1186/1471-2105-11-345.
9
SLR: a scaffolding algorithm based on long reads and contig classification.SLR:一种基于长读段和重叠群分类的支架算法。
BMC Bioinformatics. 2019 Oct 30;20(1):539. doi: 10.1186/s12859-019-3114-9.
10
Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences.Opera:利用高通量双末端序列重建最优基因组支架
J Comput Biol. 2011 Nov;18(11):1681-91. doi: 10.1089/cmb.2011.0170. Epub 2011 Sep 19.

引用本文的文献

1
Sequencing and assembly of the Egyptian buffalo genome.埃及水牛基因组的测序和组装。
PLoS One. 2020 Aug 19;15(8):e0237087. doi: 10.1371/journal.pone.0237087. eCollection 2020.
2
Choice of assembly software has a critical impact on virome characterisation.组装软件的选择对病毒组学的特征分析有重大影响。
Microbiome. 2019 Jan 28;7(1):12. doi: 10.1186/s40168-019-0626-5.
3
CAMSA: a tool for comparative analysis and merging of scaffold assemblies.CAMSA:一种用于支架组件比较分析和合并的工具。
BMC Bioinformatics. 2017 Dec 6;18(Suppl 15):496. doi: 10.1186/s12859-017-1919-y.
4
Improvement of the banana "Musa acuminata" reference sequence using NGS data and semi-automated bioinformatics methods.利用二代测序(NGS)数据和半自动生物信息学方法改进香蕉“尖叶蕉(Musa acuminata)”参考序列
BMC Genomics. 2016 Mar 16;17:243. doi: 10.1186/s12864-016-2579-4.
5
Exact approaches for scaffolding.支架搭建的精确方法。
BMC Bioinformatics. 2015;16 Suppl 14(Suppl 14):S2. doi: 10.1186/1471-2105-16-S14-S2. Epub 2015 Oct 2.
6
Ancestral gene synteny reconstruction improves extant species scaffolding.祖先基因共线性重建改善现存物种的支架搭建。
BMC Genomics. 2015;16 Suppl 10(Suppl 10):S11. doi: 10.1186/1471-2164-16-S10-S11. Epub 2015 Oct 2.
7
Development and validation of an rDNA operon based primer walking strategy applicable to de novo bacterial genome finishing.基于 rDNA 操纵子的引物步行策略的开发和验证适用于从头细菌基因组完成。
Front Microbiol. 2015 Jan 21;5:769. doi: 10.3389/fmicb.2014.00769. eCollection 2014.
8
In vitro, long-range sequence information for de novo genome assembly via transposase contiguity.在体外,通过转座酶邻接进行从头基因组组装的长程序列信息。
Genome Res. 2014 Dec;24(12):2041-9. doi: 10.1101/gr.178319.114. Epub 2014 Oct 19.
9
BESST--efficient scaffolding of large fragmented assemblies.BESST--高效构建大型碎片化组装体。
BMC Bioinformatics. 2014 Aug 15;15(1):281. doi: 10.1186/1471-2105-15-281.
10
Generation of physical map contig-specific sequences.生成物理图谱连续序列特异性片段。
Front Genet. 2014 Jul 22;5:243. doi: 10.3389/fgene.2014.00243. eCollection 2014.