• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

HiMMe:利用遗传模式作为基因组组装可靠性评估的替代指标。

HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment.

作者信息

Abante Jordi, Ghaffari Noushin, Johnson Charles D, Datta Aniruddha

机构信息

Whitaker Biomedical Engineering Institute, Johns Hopkins University, 3400 N Charles St, Baltimore, MD, USA.

Center for Bioinformatics and Genomic Systems Engineering (CBGSE), 101 Gateway Blvd., College Station, TX, USA.

出版信息

BMC Genomics. 2017 Sep 5;18(1):694. doi: 10.1186/s12864-017-3965-2.

DOI:10.1186/s12864-017-3965-2
PMID:28874136
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5584555/
Abstract

BACKGROUND

The information content of genomes plays a crucial role in the existence and proper development of living organisms. Thus, tremendous effort has been dedicated to developing DNA sequencing technologies that provide a better understanding of the underlying mechanisms of cellular processes. Advances in the development of sequencing technology have made it possible to sequence genomes in a relatively fast and inexpensive way. However, as with any measurement technology, there is noise involved and this needs to be addressed to reach conclusions based on the resulting data. In addition, there are multiple intermediate steps and degrees of freedom when constructing genome assemblies that lead to ambiguous and inconsistent results among assemblers.

METHODS

Here we introduce HiMMe, an HMM-based tool that relies on genetic patterns to score genome assemblies. Through a Markov chain, the model is able to detect characteristic genetic patterns, while, by introducing emission probabilities, the noise involved in the process is taken into account. Prior knowledge can be used by training the model to fit a given organism or sequencing technology.

RESULTS

Our results show that the method presented is able to recognize patterns even with relatively small k-mer size choices and limited computational resources.

CONCLUSIONS

Our methodology provides an individual quality metric per contig in addition to an overall genome assembly score, with a time complexity well below that of an aligner. Ultimately, HiMMe provides meaningful statistical insights that can be leveraged by researchers to better select contigs and genome assemblies for downstream analysis.

摘要

背景

基因组的信息内容在生物体的生存和正常发育中起着至关重要的作用。因此,人们付出了巨大努力来开发DNA测序技术,以便更好地理解细胞过程的潜在机制。测序技术的发展进步使得以相对快速且廉价的方式对基因组进行测序成为可能。然而,与任何测量技术一样,其中存在噪声,需要解决这一问题才能基于所得数据得出结论。此外,在构建基因组组装体时存在多个中间步骤和自由度,这导致组装程序之间产生模糊和不一致的结果。

方法

在此我们介绍HiMMe,一种基于隐马尔可夫模型(HMM)的工具,它依靠遗传模式对基因组组装体进行评分。通过马尔可夫链,该模型能够检测特征遗传模式,同时,通过引入发射概率,将过程中涉及的噪声考虑在内。通过训练模型以适应给定的生物体或测序技术,可以利用先验知识。

结果

我们的结果表明,即使在相对较小的k-mer大小选择和有限的计算资源情况下,所提出的方法也能够识别模式。

结论

我们的方法除了提供总体基因组组装评分外,还为每个重叠群提供单独的质量指标,其时间复杂度远低于比对器。最终,HiMMe提供了有意义的统计见解,研究人员可以利用这些见解更好地选择重叠群和基因组组装体用于下游分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238f/5584555/af0a94335702/12864_2017_3965_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238f/5584555/3e260c242ae6/12864_2017_3965_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238f/5584555/052bb20c8c0c/12864_2017_3965_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238f/5584555/2812b9e717b7/12864_2017_3965_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238f/5584555/3560ee3994e1/12864_2017_3965_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238f/5584555/436d4eba965d/12864_2017_3965_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238f/5584555/96f0b8a6ea0d/12864_2017_3965_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238f/5584555/af0a94335702/12864_2017_3965_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238f/5584555/3e260c242ae6/12864_2017_3965_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238f/5584555/052bb20c8c0c/12864_2017_3965_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238f/5584555/2812b9e717b7/12864_2017_3965_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238f/5584555/3560ee3994e1/12864_2017_3965_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238f/5584555/436d4eba965d/12864_2017_3965_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238f/5584555/96f0b8a6ea0d/12864_2017_3965_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238f/5584555/af0a94335702/12864_2017_3965_Fig7_HTML.jpg

相似文献

1
HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment.HiMMe:利用遗传模式作为基因组组装可靠性评估的替代指标。
BMC Genomics. 2017 Sep 5;18(1):694. doi: 10.1186/s12864-017-3965-2.
2
dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies.dnAQET:一种用于计算从头组装质量基准测试综合指标的框架。
BMC Genomics. 2019 Sep 11;20(1):706. doi: 10.1186/s12864-019-6070-x.
3
The complex task of choosing a de novo assembly: lessons from fungal genomes.选择从头组装的复杂任务:来自真菌基因组的经验教训。
Comput Biol Chem. 2014 Dec;53 Pt A:97-107. doi: 10.1016/j.compbiolchem.2014.08.014. Epub 2014 Aug 29.
4
HGA: de novo genome assembly method for bacterial genomes using high coverage short sequencing reads.HGA:一种利用高覆盖度短测序读段进行细菌基因组从头组装的方法。
BMC Genomics. 2016 Mar 5;17:193. doi: 10.1186/s12864-016-2515-7.
5
Bayesian restoration of a hidden Markov chain with applications to DNA sequencing.应用于DNA测序的隐马尔可夫链的贝叶斯恢复
J Comput Biol. 1999 Summer;6(2):261-77. doi: 10.1089/cmb.1999.6.261.
6
Assessment of de novo assemblers for draft genomes: a case study with fungal genomes.用于基因组草图的从头组装程序评估:以真菌基因组为例的研究
BMC Genomics. 2014;15 Suppl 9(Suppl 9):S10. doi: 10.1186/1471-2164-15-S9-S10. Epub 2014 Dec 8.
7
ConPADE: genome assembly ploidy estimation from next-generation sequencing data.ConPADE:基于下一代测序数据的基因组组装倍性估计
PLoS Comput Biol. 2015 Apr 16;11(4):e1004229. doi: 10.1371/journal.pcbi.1004229. eCollection 2015 Apr.
8
Application of supervised machine learning algorithms for the classification of regulatory RNA riboswitches.监督式机器学习算法在调控RNA核糖开关分类中的应用。
Brief Funct Genomics. 2017 Mar 1;16(2):99-105. doi: 10.1093/bfgp/elw005.
9
Positional bias in variant calls against draft reference assemblies.针对草图参考基因组组装的变异位点调用中的位置偏差。
BMC Genomics. 2017 Mar 28;18(1):263. doi: 10.1186/s12864-017-3637-2.
10
GenSeed-HMM: A Tool for Progressive Assembly Using Profile HMMs as Seeds and its Application in Alpavirinae Viral Discovery from Metagenomic Data.GenSeed-HMM:一种使用隐马尔可夫模型轮廓作为种子进行渐进式组装的工具及其在从宏基因组数据中发现阿尔帕病毒科病毒中的应用
Front Microbiol. 2016 Mar 4;7:269. doi: 10.3389/fmicb.2016.00269. eCollection 2016.

引用本文的文献

1
BASiNET-BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification.BASiNET-生物序列 NETwork:一个关于编码和非编码 RNA 鉴定的案例研究。
Nucleic Acids Res. 2018 Sep 19;46(16):e96. doi: 10.1093/nar/gky462.

本文引用的文献

1
BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.BUSCO:利用单拷贝同源基因评估基因组组装和注释的完整性。
Bioinformatics. 2015 Oct 1;31(19):3210-2. doi: 10.1093/bioinformatics/btv351. Epub 2015 Jun 9.
2
CodingQuarry: highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts.CodingQuarry:利用RNA测序转录本对真菌基因组进行高精度隐马尔可夫模型基因预测。
BMC Genomics. 2015 Mar 11;16(1):170. doi: 10.1186/s12864-015-1344-4.
3
The MaSuRCA genome assembler.马苏尔卡基因组组装器。
Bioinformatics. 2013 Nov 1;29(21):2669-77. doi: 10.1093/bioinformatics/btt476. Epub 2013 Aug 29.
4
Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species.Assemblathon2:在三个脊椎动物物种中评估从头组装基因组方法。
Gigascience. 2013 Jul 22;2(1):10. doi: 10.1186/2047-217X-2-10.
5
QUAST: quality assessment tool for genome assemblies.QUAST:基因组组装质量评估工具。
Bioinformatics. 2013 Apr 15;29(8):1072-5. doi: 10.1093/bioinformatics/btt086. Epub 2013 Feb 19.
6
Efficient de novo assembly of large genomes using compressed data structures.利用压缩数据结构进行高效的从头基因组组装。
Genome Res. 2012 Mar;22(3):549-56. doi: 10.1101/gr.126953.111. Epub 2011 Dec 7.
7
GAGE: A critical evaluation of genome assemblies and assembly algorithms.盖奇:基因组组装和算法的关键评估。
Genome Res. 2012 Mar;22(3):557-67. doi: 10.1101/gr.131383.111. Epub 2012 Jan 6.
8
Bambus 2: scaffolding metagenomes.Bambus 2:支架宏基因组。
Bioinformatics. 2011 Nov 1;27(21):2964-71. doi: 10.1093/bioinformatics/btr520. Epub 2011 Sep 16.
9
HMMER web server: interactive sequence similarity searching.HMMER 网页服务器:交互式序列相似性搜索。
Nucleic Acids Res. 2011 Jul;39(Web Server issue):W29-37. doi: 10.1093/nar/gkr367. Epub 2011 May 18.
10
Hidden Markov Models and their Applications in Biological Sequence Analysis.隐马尔可夫模型及其在生物序列分析中的应用。
Curr Genomics. 2009 Sep;10(6):402-15. doi: 10.2174/138920209789177575.