Suppr超能文献

基于代谢组学分析的小细胞肺癌预测集成堆叠机器学习模型

Integrative Stacking Machine Learning Model for Small Cell Lung Cancer Prediction Using Metabolomics Profiling.

作者信息

Sumon Md Shaheenur Islam, Malluhi Marwan, Anan Noushin, AbuHaweeleh Mohannad Natheef, Krzyslak Hubert, Vranic Semir, Chowdhury Muhammad E H, Pedersen Shona

机构信息

Department of Electrical Engineering, Qatar University, Doha 2713, Qatar.

College of Medicine, QU Health, Qatar University, Doha 2713, Qatar.

出版信息

Cancers (Basel). 2024 Dec 18;16(24):4225. doi: 10.3390/cancers16244225.

Abstract

Small cell lung cancer (SCLC) is an extremely aggressive form of lung cancer, characterized by rapid progression and poor survival rates. Despite the importance of early diagnosis, the current diagnostic techniques are invasive and restricted. This study presents a novel stacking-based ensemble machine learning approach for classifying small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) using metabolomics data. The analysis included 191 SCLC cases, 173 NSCLC cases, and 97 healthy controls. Feature selection techniques identified significant metabolites, with positive ions proving more relevant. For multi-class classification (control, SCLC, NSCLC), the stacking ensemble achieved 85.03% accuracy and 92.47 AUC using Support Vector Machine (SVM). Binary classification (SCLC vs. NSCLC) further improved performance, with ExtraTreesClassifier reaching 88.19% accuracy and 92.65 AUC. SHapley Additive exPlanations (SHAP) analysis revealed key metabolites like benzoic acid, DL-lactate, and L-arginine as significant predictors. The stacking ensemble approach effectively leverages multiple classifiers to enhance overall predictive performance. The proposed model effectively captures the complementary strengths of different classifiers, enhancing the detection of SCLC and NSCLC. This work accentuates the potential of combining metabolomics with advanced machine learning for non-invasive early lung cancer subtype detection, offering an alternative to conventional biopsy methods.

摘要

小细胞肺癌(SCLC)是一种极具侵袭性的肺癌形式,其特点是进展迅速且生存率低。尽管早期诊断很重要,但目前的诊断技术具有侵入性且受限。本研究提出了一种基于堆叠的新型集成机器学习方法,用于使用代谢组学数据对小细胞肺癌(SCLC)和非小细胞肺癌(NSCLC)进行分类。分析包括191例SCLC病例、173例NSCLC病例和97例健康对照。特征选择技术确定了重要的代谢物,结果表明正离子更具相关性。对于多类分类(对照、SCLC、NSCLC),使用支持向量机(SVM)的堆叠集成方法实现了85.03%的准确率和92.47的AUC。二元分类(SCLC与NSCLC)进一步提高了性能,ExtraTreesClassifier的准确率达到88.19%,AUC为92.65。SHapley加法解释(SHAP)分析揭示了苯甲酸、DL-乳酸和L-精氨酸等关键代谢物是重要的预测指标。堆叠集成方法有效地利用了多个分类器来提高整体预测性能。所提出的模型有效地捕捉了不同分类器的互补优势,增强了对SCLC和NSCLC的检测。这项工作强调了将代谢组学与先进机器学习相结合用于非侵入性早期肺癌亚型检测的潜力,为传统活检方法提供了一种替代方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/133a/11727543/5477926b18ca/cancers-16-04225-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验