School of Freshwater Sciences, University of Wisconsin-Milwaukee, Milwaukee, WI, USA.
Department of Medicine, University of Chicago, Chicago, IL, USA.
Microbiome. 2018 Oct 18;6(1):185. doi: 10.1186/s40168-018-0568-3.
Clostridiales and Bacteroidales are uniquely adapted to the gut environment and have co-evolved with their hosts resulting in convergent microbiome patterns within mammalian species. As a result, members of Clostridiales and Bacteroidales are particularly suitable for identifying sources of fecal contamination in environmental samples. However, a comprehensive evaluation of their predictive power and development of computational approaches is lacking. Given the global public health concern for waterborne disease, accurate identification of fecal pollution sources is essential for effective risk assessment and management. Here, we use random forest algorithm and 16S rRNA gene amplicon sequences assigned to Clostridiales and Bacteroidales to identify common fecal pollution sources. We benchmarked the accuracy, consistency, and sensitivity of our classification approach using fecal, environmental, and artificial in silico generated samples.
Clostridiales and Bacteroidales classifiers were composed mainly of sequences that displayed differential distributions (host-preferred) among sewage, cow, deer, pig, cat, and dog sources. Each classifier correctly identified human and individual animal sources in approximately 90% of the fecal and environmental samples tested. Misclassifications resulted mostly from false-positive identification of cat and dog fecal signatures in host animals not used to build the classifiers, suggesting characterization of additional animals would improve accuracy. Random forest predictions were highly reproducible, reflecting the consistency of the bacterial signatures within each of the animal and sewage sources. Using in silico generated samples, we could detect fecal bacterial signatures when the source dataset accounted for as little as ~ 0.5% of the assemblage, with ~ 0.04% of the sequences matching the classifiers. Finally, we developed a proxy to estimate proportions among sources, which allowed us to determine which sources contribute the most to observed fecal pollution.
Random forest classification with 16S rRNA gene amplicons offers a rapid, sensitive, and accurate solution for identifying host microbial signatures to detect human and animal fecal contamination in environmental samples.
梭菌目和拟杆菌目是唯一适应肠道环境的细菌,与宿主共同进化,导致哺乳动物物种内部的微生物组模式趋同。因此,梭菌目和拟杆菌目的成员特别适合识别环境样本中粪便污染的来源。然而,它们的预测能力缺乏全面评估,计算方法也尚未开发。鉴于全球对水传播疾病的公共卫生关注,准确识别粪便污染的来源对于有效的风险评估和管理至关重要。在这里,我们使用随机森林算法和 16S rRNA 基因扩增子序列来识别常见的粪便污染源。我们使用粪便、环境和人工合成的模拟数据来评估分类方法的准确性、一致性和敏感性。
梭菌目和拟杆菌目的分类器主要由在污水、牛、鹿、猪、猫和狗来源中显示出差异分布(宿主偏好)的序列组成。每个分类器在大约 90%的测试粪便和环境样本中正确识别了人类和单个动物来源。错误分类主要是由于错误地识别了未用于构建分类器的宿主动物中的猫和狗粪便特征,这表明对其他动物进行特征描述可以提高准确性。随机森林预测具有高度的可重复性,反映了每个动物和污水来源中细菌特征的一致性。使用人工合成的样本,我们可以在源数据集仅占聚集体的约 0.5%时检测到粪便细菌特征,其中约 0.04%的序列与分类器匹配。最后,我们开发了一种代理来估计来源之间的比例,这使我们能够确定哪些来源对观察到的粪便污染贡献最大。
基于 16S rRNA 基因扩增子的随机森林分类为识别宿主微生物特征提供了一种快速、敏感和准确的方法,可用于检测环境样本中的人类和动物粪便污染。