Suppr超能文献

基于代谢组学的机器学习模型可准确预测乳腺癌雌激素受体状态。

Metabolomics-Based Machine Learning Models Accurately Predict Breast Cancer Estrogen Receptor Status.

作者信息

Arumalla Kamala K, Haince Jean-François, Bux Rashid A, Huang Guoyu, Tappia Paramjit S, Ramjiawan Bram, Ford W Randolph, Vaida Maria

机构信息

Department of Analytics, Harrisburg University of Science and Technology, Harrisburg, PA 17101, USA.

BioMark Diagnostic Solutions Inc., Quebec, QC G1K 3G5, Canada.

出版信息

Int J Mol Sci. 2024 Dec 4;25(23):13029. doi: 10.3390/ijms252313029.

Abstract

Breast cancer is a global concern as a leading cause of death for women. Early and precise diagnosis can be vital in handling the disease efficiently. Breast cancer subtyping based on estrogen receptor (ER) status is crucial for determining prognosis and treatment. This study uses metabolomics data from plasma samples to detect metabolite biomarkers that could distinguish ER-positive from ER-negative breast cancers in a non-invasive manner. The dataset includes demographic information, ER status, and metabolite levels from 188 breast cancer patients and 73 healthy controls. Recursive Feature Elimination (RFE) with a Random Forest (RF) classifier identified an optimal subset of 30 features-29 biomarkers and age-that achieved the highest area under the curve (AUC). To address the class imbalance, Gaussian noise-based augmentation and Adaptive Synthetic Oversampling (ADASYN) were applied, ensuring balanced representation during training. Four machine learning (ML) algorithms-Random Forest, Support Vector Classifier (SVC), XGBoost, and Logistic Regression (LR)-were evaluated using grid search. The Random Forest classifier emerged as the top performer, achieving an AUC of 0.95 and an accuracy of 93%. These results suggest that ML has great promise for identifying specific metabolites linked to ER expression, paving the development of a novel analytical tool that can minimize current challenges in identifying ER status, and improve the precision of breast cancer subtyping.

摘要

乳腺癌作为女性主要死因,是一个全球性问题。早期精确诊断对于有效应对该疾病至关重要。基于雌激素受体(ER)状态的乳腺癌亚型分类对于确定预后和治疗至关重要。本研究使用血浆样本的代谢组学数据来检测代谢物生物标志物,这些标志物可以以非侵入性方式区分ER阳性和ER阴性乳腺癌。该数据集包括188名乳腺癌患者和73名健康对照的人口统计学信息、ER状态和代谢物水平。使用随机森林(RF)分类器的递归特征消除(RFE)确定了一个由30个特征(29个生物标志物和年龄)组成的最优子集,该子集实现了最高的曲线下面积(AUC)。为了解决类别不平衡问题,应用了基于高斯噪声的增强和自适应合成过采样(ADASYN),以确保训练期间的平衡表示。使用网格搜索评估了四种机器学习(ML)算法——随机森林、支持向量分类器(SVC)、XGBoost和逻辑回归(LR)。随机森林分类器表现最佳,AUC为0.95,准确率为93%。这些结果表明,机器学习在识别与ER表达相关的特定代谢物方面具有巨大潜力,为开发一种新型分析工具奠定了基础,可以最小化当前识别ER状态的挑战,并提高乳腺癌亚型分类的精度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c574/11641454/02abe39d1020/ijms-25-13029-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验