• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用带间隙限制的共线性链接进行标准化 N50 组装度量。

Normalized N50 assembly metric using gap-restricted co-linear chaining.

机构信息

Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, P,O, Box 68 (Gustaf Hällstromin katu 2b), Helsinki, 00014, Finland.

出版信息

BMC Bioinformatics. 2012 Oct 3;13:255. doi: 10.1186/1471-2105-13-255.

DOI:10.1186/1471-2105-13-255
PMID:23031320
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3556137/
Abstract

BACKGROUND

For the development of genome assembly tools, some comprehensive and efficiently computable validation measures are required to assess the quality of the assembly. The mostly used N50 measure summarizes the assembly results by the length of the scaffold (or contig) overlapping the midpoint of the length-order concatenation of scaffolds (contigs). Especially for scaffold assemblies it is non-trivial to combine a correctness measure to the N50 values, and the current methods for doing this are rather involved.

RESULTS

We propose a simple but rigorous normalized N50 assembly metric that combines N50 with such a correctness measure; assembly is split into as many parts as necessary to align each part to the reference. For scalability, we first compute maximal local approximate matches between scaffolds and reference in distributed manner, and then proceed with co-linear chaining to find a global alignment. Best alignment is removed from the scaffold and the process is iterated with the remaining scaffold content in order to split the scaffold into correctly aligning parts. The proposed normalized N50 metric is then the N50 value computed for the final correctly aligning parts. As a side result of independent interest, we show how to modify co-linear chaining to restrict gaps to produce a more sensible global alignment.

CONCLUSIONS

We propose and implement a comprehensive and efficient approach to compute a metric that summarizes scaffold assembly correctness and length. Our implementation can be downloaded from http://www.cs.helsinki.fi/group/scaffold/normalizedN50/.

摘要

背景

为了开发基因组组装工具,需要一些全面且可高效计算的验证措施来评估组装的质量。最常用的 N50 度量标准通过重叠支架(或重叠群)中点的长度来总结组装结果(或重叠群)的长度顺序连接。特别是对于支架组装,将正确性度量标准与 N50 值结合起来并不是一件简单的事情,目前的方法相当复杂。

结果

我们提出了一种简单但严格的归一化 N50 组装度量标准,将 N50 与这种正确性度量标准相结合;将组装分割成尽可能多的部分,以便将每个部分与参考对齐。为了提高可扩展性,我们首先以分布式方式计算支架和参考之间的最大局部近似匹配,然后继续进行共线性链接以找到全局对齐。从支架中删除最佳对齐,然后使用剩余的支架内容进行迭代,以便将支架分割成正确对齐的部分。然后,将 N50 值计算为最终正确对齐部分的 N50 值。作为独立感兴趣的一个次要结果,我们展示了如何修改共线性链接以限制间隙以产生更合理的全局对齐。

结论

我们提出并实现了一种全面且高效的方法来计算总结支架组装正确性和长度的度量标准。我们的实现可以从 http://www.cs.helsinki.fi/group/scaffold/normalizedN50/ 下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf1f/3556137/efc52acbfa3d/1471-2105-13-255-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf1f/3556137/efc52acbfa3d/1471-2105-13-255-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf1f/3556137/efc52acbfa3d/1471-2105-13-255-1.jpg

相似文献

1
Normalized N50 assembly metric using gap-restricted co-linear chaining.使用带间隙限制的共线性链接进行标准化 N50 组装度量。
BMC Bioinformatics. 2012 Oct 3;13:255. doi: 10.1186/1471-2105-13-255.
2
U: A New Metric for Measuring Assembly Output Based on Non-Overlapping, Target-Specific Contigs.U:一种基于非重叠、特定目标重叠群测量装配输出的新指标。
J Comput Biol. 2017 Nov;24(11):1071-1080. doi: 10.1089/cmb.2017.0013. Epub 2017 Apr 18.
3
LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly.LR_Gapcloser:一种基于平铺路径的缺口闭合器,它使用长读长来完成基因组组装。
Gigascience. 2019 Jan 1;8(1):giy157. doi: 10.1093/gigascience/giy157.
4
Assembly reconciliation.装配核对
Bioinformatics. 2008 Jan 1;24(1):42-5. doi: 10.1093/bioinformatics/btm542. Epub 2007 Dec 5.
5
Multi-CAR: a tool of contig scaffolding using multiple references.多连续片段比对组装工具(Multi-CAR):一种使用多个参考序列进行重叠群搭建的工具。
BMC Bioinformatics. 2016 Dec 23;17(Suppl 17):469. doi: 10.1186/s12859-016-1328-7.
6
CAR: contig assembly of prokaryotic draft genomes using rearrangements.CAR:利用重排对原核生物草图基因组进行重叠群组装。
BMC Bioinformatics. 2014 Nov 28;15(1):381. doi: 10.1186/s12859-014-0381-3.
7
Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.普通狨猴基因组的重测序改进了基因组组装和基因编码序列分析。
Sci Rep. 2015 Nov 20;5:16894. doi: 10.1038/srep16894.
8
dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies.dnAQET:一种用于计算从头组装质量基准测试综合指标的框架。
BMC Genomics. 2019 Sep 11;20(1):706. doi: 10.1186/s12864-019-6070-x.
9
OSLay: optimal syntenic layout of unfinished assemblies.OSLay:未完成组装的最优共线性布局
Bioinformatics. 2007 Jul 1;23(13):1573-9. doi: 10.1093/bioinformatics/btm153. Epub 2007 Apr 26.
10
A scaffold analysis tool using mate-pair information in genome sequencing.一种在基因组测序中利用配对末端信息的支架分析工具。
J Biomed Biotechnol. 2008;2008:675741. doi: 10.1155/2008/675741.

引用本文的文献

1
Evaluation of Enrichment Approaches for the Study of the Viromes in Mollusk Species.软体动物物种病毒组研究中富集方法的评估
Food Environ Virol. 2025 Jan 12;17(1):18. doi: 10.1007/s12560-024-09625-z.
2
De Novo Genome Assembly of the Whitespot Parrotfish (): A Valuable Scaridae Genomic Resource.从头组装的白星笛鲷()基因组:珍贵的笛鲷科基因组资源。
Genes (Basel). 2024 Feb 17;15(2):249. doi: 10.3390/genes15020249.
3
Benchmark study for evaluating the quality of reference genomes and gene annotations in 114 species.评估114个物种参考基因组和基因注释质量的基准研究

本文引用的文献

1
GAGE: A critical evaluation of genome assemblies and assembly algorithms.盖奇:基因组组装和算法的关键评估。
Genome Res. 2012 Mar;22(3):557-67. doi: 10.1101/gr.131383.111. Epub 2012 Jan 6.
2
Fast scaffolding with small independent mixed integer programs.快速搭建小型独立混合整数规划。
Bioinformatics. 2011 Dec 1;27(23):3259-65. doi: 10.1093/bioinformatics/btr562. Epub 2011 Oct 13.
3
Assemblathon 1: a competitive assessment of de novo short read assembly methods.Assemblathon 1:从头开始的短读序列组装方法的竞争性评估。
Front Vet Sci. 2023 Feb 21;10:1128570. doi: 10.3389/fvets.2023.1128570. eCollection 2023.
4
A Review of Bioinformatics Tools for Bio-Prospecting from Metagenomic Sequence Data.用于从宏基因组序列数据中进行生物勘探的生物信息学工具综述。
Front Genet. 2017 Mar 6;8:23. doi: 10.3389/fgene.2017.00023. eCollection 2017.
5
Students' perspective on genomics: from sample to sequence using the case study of blueberry.学生对基因组学的看法:以蓝莓为例,从样本到序列。
Front Genet. 2013 Nov 26;4:245. doi: 10.3389/fgene.2013.00245. eCollection 2013.
6
Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species.Assemblathon2:在三个脊椎动物物种中评估从头组装基因组方法。
Gigascience. 2013 Jul 22;2(1):10. doi: 10.1186/2047-217X-2-10.
7
QUAST: quality assessment tool for genome assemblies.QUAST:基因组组装质量评估工具。
Bioinformatics. 2013 Apr 15;29(8):1072-5. doi: 10.1093/bioinformatics/btt086. Epub 2013 Feb 19.
Genome Res. 2011 Dec;21(12):2224-41. doi: 10.1101/gr.126599.111. Epub 2011 Sep 16.
4
Cactus: Algorithms for genome multiple sequence alignment.仙人掌:基因组多重序列比对算法。
Genome Res. 2011 Sep;21(9):1512-28. doi: 10.1101/gr.123356.111. Epub 2011 Jun 10.
5
progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement.渐进紫:具有基因增益、缺失和重排的多基因组比对。
PLoS One. 2010 Jun 25;5(6):e11147. doi: 10.1371/journal.pone.0011147.
6
Sense from sequence reads: methods for alignment and assembly.从序列读取中获取意义:比对和组装方法
Nat Methods. 2009 Nov;6(11 Suppl):S6-S12. doi: 10.1038/nmeth.1376.
7
Efficient q-gram filters for finding all epsilon-matches over a given length.用于在给定长度上查找所有ε匹配项的高效q-gram过滤器。
J Comput Biol. 2006 Mar;13(2):296-308. doi: 10.1089/cmb.2006.13.296.