Henderson Gemma, Yilmaz Pelin, Kumar Sandeep, Forster Robert J, Kelly William J, Leahy Sinead C, Guan Le Luo, Janssen Peter H
Grasslands Research Centre, AgResearch, Palmerston North, New Zealand.
Microbial Genomics and Bioinformatics Research Group, Max Planck Institute for Marine Microbiology, Bremen, Germany.
PeerJ. 2019 Mar 5;7:e6496. doi: 10.7717/peerj.6496. eCollection 2019.
The taxonomy and associated nomenclature of many taxa of rumen bacteria are poorly defined within databases of 16S rRNA genes. This lack of resolution results in inadequate definition of microbial community structures, with large parts of the community designated as incertae sedis, unclassified, or uncultured within families, orders, or even classes. We have begun resolving these poorly-defined groups of rumen bacteria, based on our desire to name these for use in microbial community profiling. We used the previously-reported global rumen census (GRC) dataset consisting of >4.5 million partial bacterial 16S rRNA gene sequences amplified from 684 rumen samples and representing a wide range of animal hosts and diets. Representative sequences from the 8,985 largest operational units (groups of sequence sharing >97% sequence similarity, and covering 97.8% of all sequences in the GRC dataset) were used to identify 241 pre-defined clusters (mainly at genus or family level) of abundant rumen bacteria in the ARB SILVA 119 framework. A total of 99 of these clusters (containing 63.8% of all GRC sequences) had no unique or had inadequate taxonomic identifiers, and each was given a unique nomenclature. We assessed this improved framework by comparing taxonomic assignments of bacterial 16S rRNA gene sequence data in the GRC dataset with those made using the original SILVA 119 framework, and three other frameworks. The two SILVA frameworks performed best at assigning sequences to genus-level taxa. The SILVA 119 framework allowed 55.4% of the sequence data to be assigned to 751 uniquely identifiable genus-level groups. The improved framework increased this to 87.1% of all sequences being assigned to one of 871 uniquely identifiable genus-level groups. The new designations were included in the SILVA 123 release (https://www.arb-silva.de/documentation/release-123/) and will be perpetuated in future releases.
在16S rRNA基因数据库中,许多瘤胃细菌类群的分类学及相关命名法定义不明确。这种分辨率的缺乏导致微生物群落结构定义不充分,群落的很大一部分在科、目甚至纲内被指定为地位不确定、未分类或未培养。基于我们希望为这些细菌命名以用于微生物群落分析的愿望,我们已开始解析这些定义不明确的瘤胃细菌类群。我们使用了先前报道的全球瘤胃普查(GRC)数据集,该数据集由从684个瘤胃样本中扩增出的超过450万个部分细菌16S rRNA基因序列组成,代表了广泛的动物宿主和饮食。来自8985个最大操作单元(序列相似度>97%的序列组,覆盖GRC数据集中所有序列的97.8%)的代表性序列用于在ARB SILVA 119框架中识别241个预定义的丰富瘤胃细菌簇(主要在属或科级水平)。其中共有99个簇(包含所有GRC序列的63.8%)没有唯一的或分类标识符不充分,每个簇都被赋予了一个唯一的命名法。我们通过比较GRC数据集中细菌16S rRNA基因序列数据与使用原始SILVA 119框架以及其他三个框架所做的分类学分配,来评估这个改进后的框架。两个SILVA框架在将序列分配到属级分类单元方面表现最佳。SILVA 119框架允许55.4%的序列数据被分配到751个唯一可识别的属级组。改进后的框架将这一比例提高到87.1%的所有序列被分配到871个唯一可识别的属级组中的一个。这些新命名已包含在SILVA 123版本(https://www.arb-silva.de/documentation/release-123/)中,并将在未来版本中延续。