Kashyap Pragya, Raj Kalbhavi Vadhi, Sharma Jyoti, Dutt Naveen, Yadav Pankaj
Department of Bioscience & Bioengineering, Indian Institute of Technology, Jodhpur, Rajasthan, India.
Department of Electrical Engineering, Indian Institute of Technology, Jodhpur, Rajasthan, India.
NPJ Syst Biol Appl. 2025 Jan 17;11(1):11. doi: 10.1038/s41540-025-00491-4.
Classification of adenocarcinoma (AC) and squamous cell carcinoma (SCC) poses significant challenges for cytopathologists, often necessitating clinical tests and biopsies that delay treatment initiation. To address this, we developed a machine learning-based approach utilizing resected lung-tissue microbiome of AC and SCC patients for subtype classification. Differentially enriched taxa were identified using LEfSe, revealing ten potential microbial markers. Linear discriminant analysis (LDA) was subsequently applied to enhance inter-class separability. Next, benchmarking was performed across six different supervised-classification algorithms viz. logistic-regression, naïve-bayes, random-forest, extreme-gradient-boost (XGBoost), k-nearest neighbor, and deep neural network. Noteworthy, XGBoost, with an accuracy of 76.25%, and AUROC (area-under-receiver-operating-characteristic) of 0.81 with 69% specificity and 76% sensitivity, outperform the other five classification algorithms using LDA-transformed features. Validation on an independent dataset confirmed its robustness with an AUROC of 0.71, with minimal false positives and negatives. This study is the first to classify AC and SCC subtypes using lung-tissue microbiome.
腺癌(AC)和鳞状细胞癌(SCC)的分类给细胞病理学家带来了重大挑战,通常需要进行临床试验和活检,这会延迟治疗的开始。为了解决这个问题,我们开发了一种基于机器学习的方法,利用AC和SCC患者切除的肺组织微生物群进行亚型分类。使用线性判别分析效应量(LEfSe)鉴定差异富集的分类群,揭示了十种潜在的微生物标志物。随后应用线性判别分析(LDA)来增强类间可分离性。接下来,对六种不同的监督分类算法进行了基准测试,即逻辑回归、朴素贝叶斯、随机森林、极端梯度提升(XGBoost)、k近邻和深度神经网络。值得注意的是,XGBoost的准确率为76.25%,受试者工作特征曲线下面积(AUROC)为0.81,特异性为69%,敏感性为76%,在使用LDA转换特征的情况下优于其他五种分类算法。在独立数据集上的验证证实了其稳健性,AUROC为0.71,假阳性和假阴性最少。这项研究是首次使用肺组织微生物群对AC和SCC亚型进行分类。