选择从头组装的复杂任务：来自真菌基因组的经验教训。

The complex task of choosing a de novo assembly: lessons from fungal genomes.

作者信息

Gallo Juan Esteban, Muñoz José Fernando, Misas Elizabeth, McEwen Juan Guillermo, Clay Oliver Keatinge

机构信息

Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; Doctoral Program in Biomedical Sciences, Universidad del Rosario, Bogotá, Colombia.

Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; Institute of Biology, Universidad de Antioquia, Medellín, Colombia.

出版信息

Comput Biol Chem. 2014 Dec;53 Pt A:97-107. doi: 10.1016/j.compbiolchem.2014.08.014. Epub 2014 Aug 29.

DOI:10.1016/j.compbiolchem.2014.08.014

PMID:25262360

Abstract

Selecting the values of parameters used by de novo genomic assembly programs, or choosing an optimal de novo assembly from several runs obtained with different parameters or programs, are tasks that can require complex decision-making. A key parameter that must be supplied to typical next generation sequencing (NGS) assemblers is the k-mer length, i.e., the word size that determines which de Bruijn graph the program should map out and use. The topic of assembly selection criteria was recently revisited in the Assemblathon 2 study (Bradnam et al., 2013). Although no clear message was delivered with regard to optimal k-mer lengths, it was shown with examples that it is sometimes important to decide if one is most interested in optimizing the sequences of protein-coding genes (the gene space) or in optimizing the whole genome sequence including the intergenic DNA, as what is best for one criterion may not be best for the other. In the present study, our aim was to better understand how the assembly of unicellular fungi (which are typically intermediate in size and complexity between prokaryotes and metazoan eukaryotes) can change as one varies the k-mer values over a wide range. We used two different de novo assembly programs (SOAPdenovo2 and ABySS), and simple assembly metrics that also focused on success in assembling the gene space and repetitive elements. A recent increase in Illumina read length to around 150 bp allowed us to attempt de novo assemblies with a larger range of k-mers, up to 127 bp. We applied these methods to Illumina paired-end sequencing read sets of fungal strains of Paracoccidioides brasiliensis and other species. By visualizing the results in simple plots, we were able to track the effect of changing k-mer size and assembly program, and to demonstrate how such plots can readily reveal discontinuities or other unexpected characteristics that assembly programs can present in practice, especially when they are used in a traditional molecular microbiology laboratory with a 'genomics corner'. Here we propose and apply a component of a first pass validation methodology for benchmarking and understanding fungal genome de novo assembly processes.

摘要

选择从头基因组组装程序所使用的参数值，或者从使用不同参数或程序获得的多次运行结果中选择最优的从头组装结果，都是需要复杂决策的任务。必须提供给典型的新一代测序（NGS）组装器的一个关键参数是k-mer长度，即决定程序应该构建并使用哪个德布鲁因图的字长。组装选择标准这一主题最近在“组装马拉松2”研究（Bradnam等人，2013年）中被重新探讨。尽管关于最优k-mer长度没有给出明确的信息，但通过实例表明，有时决定是最关注优化蛋白质编码基因的序列（基因空间）还是优化包括基因间DNA在内的全基因组序列很重要，因为对一个标准最有利的可能对另一个标准并非最有利。在本研究中，我们的目的是更好地理解单细胞真菌（其大小和复杂性通常介于原核生物和后生动物真核生物之间）的组装如何随着k-mer值在较宽范围内变化而改变。我们使用了两种不同的从头组装程序（SOAPdenovo2和ABySS），以及同样侧重于基因空间和重复元件组装成功情况的简单组装指标。最近Illumina读长增加到约150 bp，使我们能够尝试使用更大范围的k-mer进行从头组装，最大可达127 bp。我们将这些方法应用于巴西副球孢子菌和其他物种的真菌菌株的Illumina双端测序读集。通过在简单图表中可视化结果，我们能够追踪改变k-mer大小和组装程序的影响，并展示这样的图表如何能够轻易揭示组装程序在实际应用中可能呈现的不连续性或其他意外特征，特别是当它们在设有“基因组角落”的传统分子微生物学实验室中使用时。在这里，我们提出并应用了一种初步验证方法的组成部分，用于对真菌基因组从头组装过程进行基准测试和理解。

相似文献

The complex task of choosing a de novo assembly: lessons from fungal genomes.

Comput Biol Chem. 2014 Dec;53 Pt A:97-107. doi: 10.1016/j.compbiolchem.2014.08.014. Epub 2014 Aug 29.

Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.

Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers.

Sci Rep. 2019 Oct 16;9(1):14882. doi: 10.1038/s41598-019-51284-9.

Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms.

BMC Bioinformatics. 2012 Jul 18;13:170. doi: 10.1186/1471-2105-13-170.

A biologist's guide to de novo genome assembly using next-generation sequence data: A test with fungal genomes.

J Microbiol Methods. 2011 Sep;86(3):368-75. doi: 10.1016/j.mimet.2011.06.019. Epub 2011 Jul 3.

HyDA-Vista: towards optimal guided selection of k-mer size for sequence assembly.

BMC Genomics. 2014;15 Suppl 10(Suppl 10):S9. doi: 10.1186/1471-2164-15-S10-S9. Epub 2014 Dec 12.

Benchmarking and Assessment of Eight Genome Assemblers on Viral Next-Generation Sequencing Data, Including the SARS-CoV-2.

OMICS. 2022 Jul;26(7):372-381. doi: 10.1089/omi.2022.0042. Epub 2022 Jun 28.

RResolver: efficient short-read repeat resolution within ABySS.

BMC Bioinformatics. 2022 Jun 21;23(1):246. doi: 10.1186/s12859-022-04790-z.

Comparison of De Novo Transcriptome Assemblers and k-mer Strategies Using the Killifish, Fundulus heteroclitus.

PLoS One. 2016 Apr 7;11(4):e0153104. doi: 10.1371/journal.pone.0153104. eCollection 2016.

BASE: a practical de novo assembler for large genomes using long NGS reads.

BMC Genomics. 2016 Aug 31;17 Suppl 5(Suppl 5):499. doi: 10.1186/s12864-016-2829-5.

引用本文的文献

Next-Generation Sequencing Applications for the Study of Fungal Pathogens.

Microorganisms. 2022 Sep 21;10(10):1882. doi: 10.3390/microorganisms10101882.

Genome-Enhanced Detection and Identification (GEDI) of plant pathogens.

PeerJ. 2018 Feb 22;6:e4392. doi: 10.7717/peerj.4392. eCollection 2018.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

选择从头组装的复杂任务：来自真菌基因组的经验教训。

The complex task of choosing a de novo assembly: lessons from fungal genomes.

作者信息

Gallo Juan Esteban, Muñoz José Fernando, Misas Elizabeth, McEwen Juan Guillermo, Clay Oliver Keatinge

机构信息

Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; Doctoral Program in Biomedical Sciences, Universidad del Rosario, Bogotá, Colombia.

Cellular & Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia; Institute of Biology, Universidad de Antioquia, Medellín, Colombia.

出版信息

Comput Biol Chem. 2014 Dec;53 Pt A:97-107. doi: 10.1016/j.compbiolchem.2014.08.014. Epub 2014 Aug 29.

DOI:10.1016/j.compbiolchem.2014.08.014

PMID:25262360

Abstract

摘要

选择从头组装的复杂任务：来自真菌基因组的经验教训。

The complex task of choosing a de novo assembly: lessons from fungal genomes.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

选择从头组装的复杂任务：来自真菌基因组的经验教训。

The complex task of choosing a de novo assembly: lessons from fungal genomes.

作者信息

机构信息

出版信息

相似文献

引用本文的文献