Rosa Bruce A, Mihindukulasuriya Kathie, Hallsworth-Pepin Kymberlie, Wollam Aye, Martin John, Snowden Caroline, Dunne William Michael, Weinstock George M, Burnham C A, Mitreva Makedonka
McDonnell Genome Institute at Washington University, St. Louis, Missouri, USA.
Department of Medicine, Washington University School of Medicine, St. Louis, Missouri, USA.
mSystems. 2020 Feb 25;5(1):e00096-20. doi: 10.1128/mSystems.00096-20.
Whole-genome bacterial sequences are required to better understand microbial functions, niche-specific bacterial metabolism, and disease states. Although genomic sequences are available for many of the human-associated bacteria from commonly tested body habitats (e.g., feces), as few as 13% of bacterium-derived reads from other sites such as the skin map to known bacterial genomes. To facilitate a better characterization of metagenomic shotgun reads from underrepresented body sites, we collected over 10,000 bacterial isolates originating from 14 human body habitats, identified novel taxonomic groups based on full-length 16S rRNA gene sequences, clustered the sequences to ensure that no individual taxonomic group was overselected for sequencing, prioritized bacteria from underrepresented body sites (such as skin and respiratory and urinary tracts), and sequenced and assembled genomes for 665 new bacterial strains. Here, we show that addition of these genomes improved read mapping rates of Human Microbiome Project (HMP) metagenomic samples by nearly 30% for the previously underrepresented phylum , and 27.5% of the novel genomes generated here had high representation in at least one of the tested HMP samples, compared to 12.5% of the sequences in the public databases, indicating an enrichment of useful novel genomic sequences resulting from the prioritization procedure. As our understanding of the human microbiome continues to improve and to enter the realm of therapy developments, targeted approaches such as this to improve genomic databases will increase in importance from both an academic and a clinical perspective. The human microbiome plays a critically important role in health and disease, but current understanding of the mechanisms underlying the interactions between the varying microbiome and the different host environments is lacking. Having access to a database of fully sequenced bacterial genomes provides invaluable insights into microbial functions, but currently sequenced genomes for the human microbiome have largely come from a limited number of body sites (primarily feces), while other sites such as the skin, respiratory tract, and urinary tract are underrepresented, resulting in as little as 13% of bacterium-derived reads mapping to known bacterial genomes. Here, we sequenced and assembled 665 new bacterial genomes, prioritized from a larger database to select underrepresented body sites and bacterial taxa in the existing databases. As a result, we substantially improve mapping rates for samples from the Human Microbiome Project and provide an important contribution to human bacterial genomic databases for future studies.
为了更好地理解微生物功能、特定生态位的细菌代谢以及疾病状态,需要全基因组细菌序列。尽管许多来自常见检测身体栖息地(如粪便)的与人类相关细菌的基因组序列是可用的,但来自其他部位(如皮肤)的细菌读数中只有13%能映射到已知细菌基因组。为了更好地表征来自代表性不足的身体部位的宏基因组鸟枪法读数,我们收集了来自14个人体栖息地的10000多个细菌分离株,基于全长16S rRNA基因序列鉴定了新的分类群,对序列进行聚类以确保没有单个分类群被过度选择用于测序,优先选择来自代表性不足的身体部位(如皮肤、呼吸道和泌尿道)的细菌,并对665个新细菌菌株的基因组进行测序和组装。在这里,我们表明,添加这些基因组使人类微生物组计划(HMP)宏基因组样本中先前代表性不足的门的读数映射率提高了近30%,并且这里产生的27.5%的新基因组在至少一个测试的HMP样本中有高代表性,相比之下公共数据库中的序列为12.5%,这表明优先排序程序产生了有用的新基因组序列的富集。随着我们对人类微生物组的理解不断提高并进入治疗开发领域,从学术和临床角度来看,这种有针对性的方法(如改进基因组数据库)将变得越来越重要。人类微生物组在健康和疾病中起着至关重要的作用,但目前缺乏对不同微生物组与不同宿主环境之间相互作用机制的了解。能够访问一个完全测序的细菌基因组数据库为微生物功能提供了宝贵的见解,但目前人类微生物组的测序基因组主要来自有限数量的身体部位(主要是粪便),而其他部位(如皮肤、呼吸道和泌尿道)代表性不足,导致只有13%的细菌读数能映射到已知细菌基因组。在这里,我们对665个新细菌基因组进行了测序和组装,这些基因组是从一个更大的数据库中优先选择出来的,以选择现有数据库中代表性不足的身体部位和细菌分类群。结果,我们大幅提高了人类微生物组计划样本的映射率,并为未来研究的人类细菌基因组数据库做出了重要贡献。