Suppr超能文献

优化来自粪肠球菌的下一代序列数据的混合组装:一种基因组高度分化的微生物。

Optimizing hybrid assembly of next-generation sequence data from Enterococcus faecium: a microbe with highly divergent genome.

作者信息

Wang Yajun, Yu Yao, Pan Bohu, Hao Pei, Li Yixue, Shao Zhifeng, Xu Xiaogang, Li Xuan

机构信息

Shanghai Center for Systems Biomedicine, Shanghai Jiaotong University, Shanghai 200240, China.

出版信息

BMC Syst Biol. 2012;6 Suppl 3(Suppl 3):S21. doi: 10.1186/1752-0509-6-S3-S21. Epub 2012 Dec 17.

Abstract

BACKGROUND

Sequencing of bacterial genomes became an essential approach to study pathogen virulence and the phylogenetic relationship among close related strains. Bacterium Enterococcus faecium emerged as an important nosocomial pathogen that were often associated with resistance to common antibiotics in hospitals. With highly divergent gene contents, it presented a challenge to the next generation sequencing (NGS) technologies featuring high-throughput and shorter read-length. This study was designed to investigate the properties and systematic biases of NGS technologies and evaluate critical parameters influencing the outcomes of hybrid assemblies using combinations of NGS data.

RESULTS

A hospital strain of E. faecium was sequenced using three different NGS platforms: 454 GS-FLX, Illumina GAIIx, and ABI SOLiD4.0, to approximately 28-, 500-, and 400-fold coverage depth. We built a pipeline that merged contigs from each NGS data into hybrid assemblies. The results revealed that each single NGS assembly had a ceiling in continuity that could not be overcome by simply increasing data coverage depth. Each NGS technology displayed some intrinsic properties, i.e. base calling error, systematic bias, etc. The gaps and low coverage regions of each NGS assembly were associated with lower GC contents. In order to optimize the hybrid assembly approach, we tested with varying amount and different combination of NGS data, and obtained optimal conditions for assembly continuity. We also, for the first time, showed that SOLiD data could help make much improved assemblies of E. faecium genome using the hybrid approach when combined with other type of NGS data.

CONCLUSIONS

The current study addressed the difficult issue of how to most effectively construct a complete microbial genome using today's state of the art sequencing technologies. We characterized the sequence data and genome assembly from each NGS technologies, tested conditions for hybrid assembly with combinations of NGS data, and obtained optimized parameters for achieving most cost-efficiency assembly. Our study helped form some guidelines to direct genomic work on other microorganisms, thus have important practical implications.

摘要

背景

细菌基因组测序已成为研究病原体毒力及亲缘关系密切菌株间系统发育关系的重要方法。粪肠球菌已成为一种重要的医院病原体,常在医院中与对常用抗生素的耐药性相关联。由于其基因含量高度不同,这对具有高通量和较短读长的新一代测序(NGS)技术提出了挑战。本研究旨在调查NGS技术的特性和系统偏差,并评估使用NGS数据组合影响混合组装结果的关键参数。

结果

使用三种不同的NGS平台对一株医院粪肠球菌菌株进行测序:454 GS-FLX、Illumina GAIIx和ABI SOLiD4.0,覆盖深度分别约为28倍、500倍和400倍。我们构建了一个流程,将每个NGS数据中的重叠群合并到混合组装中。结果显示,每个单一的NGS组装在连续性方面都有一个上限,无法通过简单增加数据覆盖深度来克服。每种NGS技术都表现出一些内在特性,即碱基识别错误、系统偏差等。每个NGS组装的间隙和低覆盖区域与较低的GC含量相关。为了优化混合组装方法,我们用不同数量和不同组合的NGS数据进行测试,获得了组装连续性的最佳条件。我们还首次表明,当与其他类型的NGS数据结合时,SOLiD数据有助于使用混合方法对粪肠球菌基因组进行大幅改进的组装。

结论

本研究解决了如何利用当今最先进的测序技术最有效地构建完整微生物基因组这一难题。我们对每种NGS技术的序列数据和基因组组装进行了表征,测试了使用NGS数据组合进行混合组装的条件,并获得了实现最高成本效益组装的优化参数。我们的研究有助于形成一些指导方针,以指导对其他微生物的基因组研究,因此具有重要的实际意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e95/3524012/7bc3857c8b20/1752-0509-6-S3-S21-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验