Sathyamoorthi Kishaanth, Vp Abishek, Venkataramana Lokeswari Y, Prasad D Venkata Vara
Department of Computer Science, Sri Sivasubramaniya Nadar College of Engineering, Chennai, Tamil Nadu, India.
Department of Computer Science, Sri Sivasubramaniya Nadar College of Engineering, Chennai, Tamil Nadu, India.
Clin Breast Cancer. 2025 Jan;25(1):27-37. doi: 10.1016/j.clbc.2024.08.009. Epub 2024 Aug 30.
Cancer, the second leading cause of death globally, claimed 685,000 lives among 2.3 million women affected by breast cancer in 2020. Cancer prognosis plays a pivotal role in tailoring treatments and assessing efficacy, emphasizing the need for a comprehensive understanding. The goal is to develop predictive model capable of accurately predicting patient outcomes and guiding personalized treatment strategies, thereby advancing precision medicine in breast cancer care.
This project addresses limitations in current cancer prognosis models by integrating omics and non-omics data. While existing models often neglect crucial omics data like DNA methylation and miRNA, the method utilizes the TCGA dataset to incorporate these data types along with others. Employing mRMR feature selection and CNN models for each type of data for feature extraction, features are stacked and a Random Forest classifier is employed for final prognosis.
The proposed method is applied to the dataset to predict whether the patient is a long-time or a short-time survivor. This strategy showcases excellent performance, with an AUC value of 0.873, precision at 0.881, and sensitivity reaching 0.943. With an accuracy rate of 0.861, signaling an improvement of 11.96% compared to prior studies.
In conclusion, integrating diverse data with advanced machine learning holds promise for improving breast cancer prognosis. Addressing model limitations and leveraging comprehensive datasets can enhance accuracy, paving the way for better patient care. Further refinement offers potential for significant advancements in cancer prognosis and treatment strategies.
癌症是全球第二大死因,2020年在230万受乳腺癌影响的女性中夺去了68.5万人的生命。癌症预后在制定治疗方案和评估疗效方面起着关键作用,这凸显了全面理解的必要性。目标是开发能够准确预测患者预后并指导个性化治疗策略的预测模型,从而推动乳腺癌护理中的精准医学发展。
本项目通过整合组学和非组学数据来解决当前癌症预后模型的局限性。虽然现有模型常常忽略DNA甲基化和微小RNA等关键组学数据,但该方法利用TCGA数据集将这些数据类型与其他数据类型结合起来。针对每种数据类型采用mRMR特征选择和CNN模型进行特征提取,将特征进行堆叠,并采用随机森林分类器进行最终预后判断。
将所提出的方法应用于数据集以预测患者是长期幸存者还是短期幸存者。该策略展现出优异的性能,AUC值为0.873,精确率为0.881,灵敏度达到0.943。准确率为0.861,表明与先前研究相比提高了11.96%。
总之,将多样的数据与先进的机器学习相结合有望改善乳腺癌预后。解决模型局限性并利用全面的数据集可以提高准确性,为更好地护理患者铺平道路。进一步完善为癌症预后和治疗策略的重大进展提供了潜力。