Rafiepoor Haniyeh, Ghorbankhanloo Alireza, Zendehdel Kazem, Madar Zahra Zangeneh, Hajivalizadeh Sepideh, Hasani Zeinab, Sarmadi Ali, Amanpour-Gharaei Behzad, Barati Mohammad Amin, Saadat Mozafar, Sadegh-Zadeh Seyed-Ali, Amanpour Saeid
Cancer Biology Research Center, Cancer Institute, Tehran University of Medical Sciences, Tehran, Iran.
School of Industrial Engineering, Iran University of Science and Technology, Tehran, Iran.
Cancer Rep (Hoboken). 2025 Apr;8(4):e70175. doi: 10.1002/cnr2.70175.
Breast cancer (BC) is a major global health concern with rising incidence and mortality rates in many developing countries. Effective BC risk assessment models are crucial for prevention and early detection. While the Gail model, a traditional logistic regression-based model, has been broadly used, its predictive performance may be limited by its linear assumptions. With the rapid advancement of artificial intelligence (AI) in medical sciences, various complex machine learning algorithms have been developed for risk prediction, including for BC.
This study aims to compare the quality of AI-based models with the traditional Gail model in assessing BC risk using a population dataset. It also evaluates the performance of these models in predicting BC risk.
This study involved 942 newly diagnosed BC patients and 975 healthy controls at the Cancer Institute in IKH hospital Complex, Tehran. Ten classification algorithms were applied to the dataset. The accuracy, sensitivity, precision, and feature importance in the machine learning algorithms were assessed and compared to previous studies for evaluation. The study found that AI algorithms alone did not significantly improve predictability compared to the Gail model. However, the importance of variables varied significantly among the AI algorithms. Understanding feature importance and interactions is crucial in AI modeling in order to enhance accuracy and identify critical risk factors.
This study concluded that, in BC risk prediction, incorporating specific risk factors, such as genetic and image-related variables, may be necessary to further enhance accuracy in BC risk prediction models. Furthermore, it is crucial to address modeling issues in models with a restricted number of features for future research.
乳腺癌是一个重大的全球健康问题,在许多发展中国家,其发病率和死亡率不断上升。有效的乳腺癌风险评估模型对于预防和早期检测至关重要。虽然传统的基于逻辑回归的盖尔模型已被广泛使用,但其预测性能可能受到线性假设的限制。随着人工智能在医学领域的快速发展,已经开发出各种复杂的机器学习算法用于风险预测,包括乳腺癌风险预测。
本研究旨在使用人群数据集比较基于人工智能的模型与传统盖尔模型在评估乳腺癌风险方面的质量。它还评估这些模型在预测乳腺癌风险方面的性能。
本研究纳入了德黑兰IKH医院综合癌症研究所的942名新诊断的乳腺癌患者和975名健康对照。将十种分类算法应用于该数据集。评估了机器学习算法中的准确性、敏感性、精确性和特征重要性,并与先前的研究进行比较以进行评估。研究发现,与盖尔模型相比,仅人工智能算法并没有显著提高可预测性。然而,人工智能算法之间变量的重要性差异很大。在人工智能建模中,了解特征重要性和相互作用对于提高准确性和识别关键风险因素至关重要。
本研究得出结论,在乳腺癌风险预测中,纳入特定风险因素,如遗传和图像相关变量,可能有必要进一步提高乳腺癌风险预测模型的准确性。此外,在未来研究中,解决特征数量有限的模型中的建模问题至关重要。