VIT Bhopal University, School of Biosciences, Engineering and Technology, Kothrikalan, Madhya Pradesh, India.
Institute of Aeronautical Engineering, Department of CSE, Hyderabad, Telangana, India.
Sci Rep. 2023 Jan 10;13(1):485. doi: 10.1038/s41598-023-27548-w.
Metastatic Breast Cancer (MBC) is one of the primary causes of cancer-related deaths in women. Despite several limitations, histopathological information about the malignancy is used for the classification of cancer. The objective of our study is to develop a non-invasive breast cancer classification system for the diagnosis of cancer metastases. The anaconda-Jupyter notebook is used to develop various python programming modules for text mining, data processing, and Machine Learning (ML) methods. Utilizing classification model cross-validation criteria, including accuracy, AUC, and ROC, the prediction performance of the ML models is assessed. Welch Unpaired t-test was used to ascertain the statistical significance of the datasets. Text mining framework from the Electronic Medical Records (EMR) made it easier to separate the blood profile data and identify MBC patients. Monocytes revealed a noticeable mean difference between MBC patients as compared to healthy individuals. The accuracy of ML models was dramatically improved by removing outliers from the blood profile data. A Decision Tree (DT) classifier displayed an accuracy of 83% with an AUC of 0.87. Next, we deployed DT classifiers using Flask to create a web application for robust diagnosis of MBC patients. Taken together, we conclude that ML models based on blood profile data may assist physicians in selecting intensive-care MBC patients to enhance the overall survival outcome.
转移性乳腺癌(MBC)是女性癌症相关死亡的主要原因之一。尽管存在一些局限性,但恶性肿瘤的组织病理学信息仍被用于癌症的分类。我们的研究目的是开发一种非侵入性的乳腺癌分类系统,用于诊断癌症转移。我们使用 anaconda-Jupyter 笔记本开发了各种用于文本挖掘、数据处理和机器学习(ML)方法的 Python 编程模块。利用包括准确性、AUC 和 ROC 在内的分类模型交叉验证标准来评估 ML 模型的预测性能。我们使用 Welch 无配对 t 检验来确定数据集的统计学意义。从电子病历(EMR)中提取的文本挖掘框架可以更轻松地分离血液特征数据并识别 MBC 患者。与健康个体相比,单核细胞在 MBC 患者之间表现出明显的平均差异。通过从血液特征数据中删除异常值,ML 模型的准确性得到了显著提高。决策树(DT)分类器的准确率达到 83%,AUC 为 0.87。接下来,我们使用 Flask 部署 DT 分类器,创建了一个用于 MBC 患者稳健诊断的网络应用程序。综上所述,我们得出结论,基于血液特征数据的 ML 模型可以帮助医生选择需要重症监护的 MBC 患者,从而提高整体生存结果。