Hamedi Seyedeh Zahra, Emami Hassan, Khayamzadeh Maryam, Rabiei Reza, Aria Mehrad, Akrami Majid, Zangouri Vahid
Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
Cancer Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
Sci Rep. 2024 Dec 3;14(1):30147. doi: 10.1038/s41598-024-81734-y.
Breast cancer is one of the most prevalent cancers with an increasing trend in both incidence and mortality rates in Iran. Survival analysis is a pivotal measure in setting appropriate care plans. To the best of our knowledge, this study is pioneering in Iran, introducing a multi-method approach using a Deep Neural Network (DNN) and 11 conventional machine learning (ML) methods to predict the 5 year survival of women with breast cancer. Supplying data from two centers comprising a total of 2644 records and incorporating external validation further distinguishes the study. Thirty-four features were selected based on a literature review and common variables in both datasets. Feature selection was also performed using a p value criterion (< 0.05) and a survey involving oncologists. A total of 108 models were trained. According to external validation, the DNN model trained with the Shiraz dataset, considering all features, exhibited the highest accuracy (85.56%). While the DNN model showed superior accuracy in external validation, it did not consistently achieve the highest performance across all evaluation metrics. Notably, models trained with the Shiraz dataset outperformed those trained with the Tehran dataset, possibly due to the lower number of missing values in the Shiraz dataset.
乳腺癌是伊朗最常见的癌症之一,其发病率和死亡率呈上升趋势。生存分析是制定适当护理计划的关键措施。据我们所知,本研究在伊朗具有开创性,引入了一种多方法途径,使用深度神经网络(DNN)和11种传统机器学习(ML)方法来预测乳腺癌女性的5年生存率。该研究提供了来自两个中心的共计2644条记录的数据,并纳入外部验证,这进一步凸显了该研究的独特之处。基于文献综述和两个数据集中的常见变量,选择了34个特征。还使用p值标准(<0.05)和一项涉及肿瘤学家的调查进行了特征选择。总共训练了108个模型。根据外部验证,使用设拉子数据集训练的DNN模型,考虑所有特征时,表现出最高的准确率(85.56%)。虽然DNN模型在外部验证中显示出更高的准确率,但它并非在所有评估指标上都始终表现最佳。值得注意的是,使用设拉子数据集训练的模型优于使用德黑兰数据集训练的模型,这可能是由于设拉子数据集中缺失值的数量较少。