INESC TEC - Institute for Systems and Computer Engineering, Technology and Science, 4200-465, Porto, Portugal.
FEUP - Faculty of Engineering, University of Porto, 4200-465, Porto, Portugal.
Sci Rep. 2023 Jul 21;13(1):11821. doi: 10.1038/s41598-023-38670-0.
Emerging evidence of the relationship between the microbiome composition and the development of numerous diseases, including cancer, has led to an increasing interest in the study of the human microbiome. Technological breakthroughs regarding DNA sequencing methods propelled microbiome studies with a large number of samples, which called for the necessity of more sophisticated data-analytical tools to analyze this complex relationship. The aim of this work was to develop a machine learning-based approach to distinguish the type of cancer based on the analysis of the tissue-specific microbial information, assessing the human microbiome as valuable predictive information for cancer identification. For this purpose, Random Forest algorithms were trained for the classification of five types of cancer-head and neck, esophageal, stomach, colon, and rectum cancers-with samples provided by The Cancer Microbiome Atlas database. One versus all and multi-class classification studies were conducted to evaluate the discriminative capability of the microbial data across increasing levels of cancer site specificity, with results showing a progressive rise in difficulty for accurate sample classification. Random Forest models achieved promising performances when predicting head and neck, stomach, and colon cancer cases, with the latter returning accuracy scores above 90% across the different studies conducted. However, there was also an increased difficulty when discriminating esophageal and rectum cancers, failing to differentiate with adequate results rectum from colon cancer cases, and esophageal from head and neck and stomach cancers. These results point to the fact that anatomically adjacent cancers can be more complex to identify due to microbial similarities. Despite the limitations, microbiome data analysis using machine learning may advance novel strategies to improve cancer detection and prevention, and decrease disease burden.
越来越多的证据表明,微生物组组成与包括癌症在内的许多疾病的发展之间存在关联,这使得人们对人类微生物组的研究产生了浓厚的兴趣。关于 DNA 测序方法的技术突破推动了大量样本的微生物组研究,这就需要更复杂的数据分析工具来分析这种复杂的关系。本研究旨在开发一种基于机器学习的方法,根据组织特异性微生物信息的分析来区分癌症类型,评估人类微生物组作为癌症识别的有价值的预测信息。为此,我们使用随机森林算法对来自癌症微生物组图谱数据库的五个类型的癌症(头颈部、食管癌、胃癌、结肠癌和直肠癌)样本进行分类。我们进行了一对一和多类分类研究,以评估微生物数据在癌症部位特异性逐渐增加的情况下的区分能力,结果表明,准确分类样本的难度逐渐增加。随机森林模型在预测头颈部、胃癌和结肠癌病例时表现出了良好的性能,后者在不同研究中准确率均超过 90%。然而,在区分食管癌和直肠癌时也存在更大的难度,无法准确区分直肠癌与结肠癌,以及食管癌与头颈部和胃癌。这些结果表明,由于微生物的相似性,解剖位置相邻的癌症可能更难识别。尽管存在局限性,但使用机器学习进行微生物组数据分析可能会为提高癌症检测和预防水平、降低疾病负担提供新的策略。