Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA.
mBio. 2022 Feb 22;13(1):e0316121. doi: 10.1128/mbio.03161-21. Epub 2022 Jan 11.
Colorectal cancer is a common and deadly disease in the United States accounting for over 50,000 deaths in 2020. This progressive disease is highly preventable with early detection and treatment, but many people do not comply with the recommended screening guidelines. The gut microbiome has emerged as a promising target for noninvasive detection of colorectal cancer. Most microbiome-based classification efforts utilize taxonomic abundance data from operational taxonomic units (OTUs) or amplicon sequence variants (ASVs) with the goal of increasing taxonomic resolution. However, it is unknown which taxonomic resolution is optimal for microbiome-based classification of colorectal cancer. To address this question, we used a reproducible machine learning framework to quantify classification performance of models based on data annotated to phylum, class, order, family, genus, OTU, and ASV levels. We found that model performance increased with increasing taxonomic resolution, up to the family level where performance was equal ( > 0.05) among family (mean area under the receiver operating characteristic curve [AUROC], 0.689), genus (mean AUROC, 0.690), and OTU (mean AUROC, 0.693) levels before decreasing at the ASV level ( < 0.05; mean AUROC, 0.676). These results demonstrate a trade-off between taxonomic resolution and prediction performance, where coarse taxonomic resolution (e.g., phylum) is not distinct enough, but fine resolution (e.g., ASV) is too individualized to accurately classify samples. Similar to the story of Goldilocks and the three bears (L. B. Cauley, , 1981), mid-range resolution (i.e., family, genus, and OTU) is "just right" for optimal prediction of colorectal cancer from microbiome data. Despite being highly preventable, colorectal cancer remains a leading cause of cancer-related death in the United States. Low-cost, noninvasive detection methods could greatly improve our ability to identify and treat early stages of disease. The microbiome has shown promise as a resource for detection of colorectal cancer. Research on the gut microbiome tends to focus on improving our ability to profile species and strain level taxonomic resolution. However, we found that finer resolution impedes the ability to predict colorectal cancer based on the gut microbiome. These results highlight the need for consideration of the appropriate taxonomic resolution for microbiome analyses and that finer resolution is not always more informative.
结直肠癌是美国常见且致命的疾病,2020 年导致超过 5 万人死亡。这种进行性疾病可以通过早期发现和治疗得到很好的预防,但许多人不遵守推荐的筛查指南。肠道微生物组已成为非侵入性检测结直肠癌的有希望的靶点。大多数基于微生物组的分类工作都利用来自操作分类单元(OTU)或扩增子序列变体(ASV)的分类丰度数据,目的是提高分类分辨率。然而,尚不清楚哪种分类分辨率最适合基于微生物组的结直肠癌分类。为了解决这个问题,我们使用了可重复的机器学习框架来量化基于标记为门、纲、目、科、属、OTU 和 ASV 水平的数据的模型的分类性能。我们发现,随着分类分辨率的提高,模型性能也随之提高,直到科水平,科(平均接收者操作特征曲线下面积[AUROC],0.689)、属(平均 AUROC,0.690)和 OTU(平均 AUROC,0.693)的性能相等(>0.05),然后在 ASV 水平(<0.05;平均 AUROC,0.676)下降。这些结果表明,分类分辨率和预测性能之间存在权衡,其中粗分类分辨率(例如,门)不够明显,但细分辨率(例如,ASV)过于个体化,无法准确分类样本。与金发姑娘和三只熊的故事(L. B. Cauley,1981)类似,中范围分辨率(即科、属和 OTU)对于从微生物组数据中最佳预测结直肠癌是“恰到好处”的。尽管结直肠癌可以高度预防,但它仍是美国癌症相关死亡的主要原因。低成本、非侵入性的检测方法可以极大地提高我们识别和治疗疾病早期阶段的能力。微生物组已显示出作为结直肠癌检测资源的潜力。肠道微生物组的研究倾向于专注于提高我们在物种和菌株水平分类分辨率方面的能力。然而,我们发现,更精细的分辨率会阻碍基于肠道微生物组预测结直肠癌的能力。这些结果强调了在进行微生物组分析时需要考虑适当的分类分辨率,并且更精细的分辨率并不总是更具信息量。