Suppr超能文献

利用二代测序(NGS)数据和半自动生物信息学方法改进香蕉“尖叶蕉(Musa acuminata)”参考序列

Improvement of the banana "Musa acuminata" reference sequence using NGS data and semi-automated bioinformatics methods.

作者信息

Martin Guillaume, Baurens Franc-Christophe, Droc Gaëtan, Rouard Mathieu, Cenci Alberto, Kilian Andrzej, Hastie Alex, Doležel Jaroslav, Aury Jean-Marc, Alberti Adriana, Carreel Françoise, D'Hont Angélique

机构信息

CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France.

Bioversity International, Parc Scientifique Agropolis II, 34397, Montpellier, Cedex 5, France.

出版信息

BMC Genomics. 2016 Mar 16;17:243. doi: 10.1186/s12864-016-2579-4.

Abstract

BACKGROUND

Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata).

RESULTS

We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80%), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5% of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70%. Unknown sites (N) were reduced from 17.3 to 10.0%.

CONCLUSION

The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species.

摘要

背景

基因组学的最新进展表明大多数基因组序列及其长程相互作用具有功能意义。由于对基因组组织和功能的详细研究需要非常高质量的基因组序列,本研究的目的是改进香蕉(Musa acuminata)的参考基因组组装。

结果

我们开发了一种模块化生物信息学流程来改进基因组序列组装,该流程可以处理各种类型的数据。该流程包含几个半自动工具。然而,与基于全局参数的传统自动化工具不同,半自动工具为用户提供了一种专家模式,用户可以通过局部折中来决定建议的改进。该流程用于改进香蕉的基因组序列草图。通过对一个分离群体进行测序基因分型(GBS)和双末端测序来检测和纠正支架错误组装。长插入片段双末端读段识别出了自动化组装方法遗漏的支架连接和融合。GBS标记通过一种新的生物信息学方法用于将支架锚定到假分子上,该方法避免了遗传图谱构建过程中标记排序这一繁琐步骤。此外,构建了一个基因组图谱并用于将支架组装成超级支架。最后,从两个先前的注释中在新组装上预测了一个一致的基因注释。这种方法将香蕉的支架总数从7513个减少到1532个(即减少了80%),N50从1.3 Mb(65个支架)增加到3.0 Mb(26个支架)。与之前的70%相比,89.5%的组装序列被锚定到11条香蕉染色体上。未知位点(N)从17.3%减少到10.0%。

结论

香蕉(Musa acuminata)参考基因组版本2的发布为详细分析香蕉基因组变异、功能和进化提供了一个平台。本研究中开发的生物信息学工具可用于改进其他物种的基因组序列组装。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71a6/4793746/358dc82018f4/12864_2016_2579_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验