面向进化和功能微生物系统发生学中核心基因识别的有效方法。

Toward an efficient method of identifying core genes for evolutionary and functional microbial phylogenies.

机构信息

Biostatistics Department, Harvard School of Public Health, Harvard University, Boston, Massachusetts, United States of America.

出版信息

PLoS One. 2011;6(9):e24704. doi: 10.1371/journal.pone.0024704. Epub 2011 Sep 12.

Abstract

Microbial community metagenomes and individual microbial genomes are becoming increasingly accessible by means of high-throughput sequencing. Assessing organismal membership within a community is typically performed using one or a few taxonomic marker genes such as the 16S rDNA, and these same genes are also employed to reconstruct molecular phylogenies. There is thus a growing need to bioinformatically catalog strongly conserved core genes that can serve as effective taxonomic markers, to assess the agreement among phylogenies generated from different core gene, and to characterize the biological functions enriched within core genes and thus conserved throughout large microbial clades. We present a method to recursively identify core genes (i.e. genes ubiquitous within a microbial clade) in high-throughput from a large number of complete input genomes. We analyzed over 1,100 genomes to produce core gene sets spanning 2,861 bacterial and archaeal clades, ranging in size from one to >2,000 genes in inverse correlation with the α-diversity (total phylogenetic branch length) spanned by each clade. These cores are enriched as expected for housekeeping functions including translation, transcription, and replication, in addition to significant representations of regulatory, chaperone, and conserved uncharacterized proteins. In agreement with previous manually curated core gene sets, phylogenies constructed from one or more of these core genes agree with those built using 16S rDNA sequence similarity, suggesting that systematic core gene selection can be used to optimize both comparative genomics and determination of microbial community structure. Finally, we examine functional phylogenies constructed by clustering genomes by the presence or absence of orthologous gene families and show that they provide an informative complement to standard sequence-based molecular phylogenies.

摘要

高通量测序技术的发展使得微生物群落宏基因组和单个微生物基因组越来越容易获取。通常使用一个或几个分类标记基因(如 16S rDNA)来评估群落中的生物组成,这些基因也被用于重建分子系统发育关系。因此,越来越需要从大量完整的输入基因组中通过生物信息学方法对能够作为有效分类标记的强保守核心基因进行编目,以评估不同核心基因产生的系统发育关系之间的一致性,并描述核心基因内富集的生物学功能,从而了解其在大的微生物类群中的保守性。我们提出了一种递归方法,可以从大量完整的输入基因组中识别高通量的核心基因(即在微生物类群中普遍存在的基因)。我们分析了超过 1100 个基因组,生成了跨越 2861 个细菌和古菌类群的核心基因集,大小从一个到 >2000 个基因不等,与每个类群的 α-多样性(总系统发育分支长度)呈反比。这些核心基因集如预期的那样富集了与翻译、转录和复制等细胞活动相关的管家功能,以及显著的调控、伴侣和保守未知功能的蛋白质。与之前手动编目的核心基因集一致,使用一个或多个这些核心基因构建的系统发育关系与使用 16S rDNA 序列相似性构建的系统发育关系一致,这表明系统地选择核心基因可以用于优化比较基因组学和确定微生物群落结构。最后,我们研究了通过聚类基因组的存在或不存在的直系同源基因家族构建的功能系统发育关系,并表明它们为标准的基于序列的分子系统发育关系提供了有价值的补充。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/673a/3171473/414413a362ff/pone.0024704.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索