Science. 2010 May 21;328(5981):994-9. doi: 10.1126/science.1183605.
The human microbiome refers to the community of microorganisms, including prokaryotes, viruses, and microbial eukaryotes, that populate the human body. The National Institutes of Health launched an initiative that focuses on describing the diversity of microbial species that are associated with health and disease. The first phase of this initiative includes the sequencing of hundreds of microbial reference genomes, coupled to metagenomic sequencing from multiple body sites. Here we present results from an initial reference genome sequencing of 178 microbial genomes. From 547,968 predicted polypeptides that correspond to the gene complement of these strains, previously unidentified ("novel") polypeptides that had both unmasked sequence length greater than 100 amino acids and no BLASTP match to any nonreference entry in the nonredundant subset were defined. This analysis resulted in a set of 30,867 polypeptides, of which 29,987 (approximately 97%) were unique. In addition, this set of microbial genomes allows for approximately 40% of random sequences from the microbiome of the gastrointestinal tract to be associated with organisms based on the match criteria used. Insights into pan-genome analysis suggest that we are still far from saturating microbial species genetic data sets. In addition, the associated metrics and standards used by our group for quality assurance are presented.
人类微生物组是指栖息于人体的微生物群落,包括原核生物、病毒和微生物真核生物。美国国立卫生研究院发起了一项计划,重点是描述与健康和疾病相关的微生物物种多样性。该计划的第一阶段包括数百个微生物参考基因组的测序,以及来自多个身体部位的宏基因组测序。在此,我们报告了 178 个微生物参考基因组测序的初步结果。在这些菌株的基因组成预测的 547968 个多肽中,定义了先前未鉴定的(“新的”)多肽,这些多肽的未遮蔽序列长度大于 100 个氨基酸,并且与非冗余子集中非参考条目没有 BLASTP 匹配。该分析产生了一组 30867 个多肽,其中 29987 个(约 97%)是独特的。此外,这套微生物基因组使得大约 40%的胃肠道微生物组的随机序列可以根据使用的匹配标准与生物体相关联。泛基因组分析的结果表明,我们仍远未饱和微生物物种的遗传数据集。此外,还介绍了我们小组用于质量保证的相关度量标准和规范。