Baykemagn Nebebe Demis, Alemayehu Meron Asmamaw, Yehuala Tirualem Zeleke, Walle Agmasie Damtew, Gedefaw Andualem Enyew, Mengistu Abraham Keffale
Department of Health Informatics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia.
Department of Epidemiology and Biostatistics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia.
Sci Rep. 2025 Jun 4;15(1):19604. doi: 10.1038/s41598-025-03112-6.
Breast self-examination is a very cost-reducing approach that significantly decreases the cost burdens associated with medical equipment, fees of healthcare practitioners, transportation to health facilities, and other indirect costs. Furthermore, it raises accessibility to health services and is significant in averting the transmission of infectious illnesses in low- and middle-income countries, constituting a sustainable channel for gains in public health. We employed a total weight of 133,425 from the Demographic and Health Survey using STATA Version 17, MS Excel 2016, and Python 3.10 for data management. Additionally, Min-Max scaling and standard scaling were used for variable scaling, along with Recursive Feature Elimination for feature selection. The data was split in an 80:20 ratio for training and testing, and balanced using Tomek Links combined with Random Over-Sampling. The model performance was evaluated by ROC-AUC, AUC, accuracy, F1 score, recall, and precision. The Decision Tree model was the best-performing one, with an accuracy of 82% and an AUC of 0.87. The reason for this superior performance is its capacity to accurately represent non-linear associations and interactions in the data, which were difficult for more conventional models like logistic regression to do. Woman's age, smartphone availability, marital status, health facility visits, HIV testing, number of children, examination by healthcare providers, wealth status, place of residence, mother's occupation, education level, social media use, health status, and distance to health facilities predictors of breast self-examination. In conclusion, Decision Tree is the top-performing model with an AUC and accuracy of 87% and 82%, respectively, due to its ability to capture non-linear relationships between predictors and the target variable, use ensemble averaging and random feature selection to reduce variance and overfitting, and its inherent feature importance mechanism that keeps it robust to irrelevant features. Based on this study finding, to increase awareness of breast self-examination (BSE), we recommend, Create awareness for community leaders about breast cancer and the benefits of self-examination, deploying mobile health clinics and outreach programs, Training health extension workers on proper BSE to share with the community, additionally, launching radio/television campaigns in local languages to disseminate information for large audience.
乳房自我检查是一种极具成本效益的方法,能显著降低与医疗设备、医护人员费用、前往医疗机构的交通费用以及其他间接成本相关的成本负担。此外,它提高了获得医疗服务的可及性,并且在低收入和中等收入国家预防传染病传播方面具有重要意义,构成了公共卫生收益的可持续途径。我们使用STATA 17版、MS Excel 2016和Python 3.10对来自人口与健康调查的总计133425个权重数据进行管理。此外,使用最小-最大缩放和标准缩放进行变量缩放,并使用递归特征消除进行特征选择。数据按80:20的比例拆分用于训练和测试,并使用Tomek Links结合随机过采样进行平衡处理。通过ROC-AUC、AUC、准确率、F1分数、召回率和精确率评估模型性能。决策树模型表现最佳,准确率为82%,AUC为0.87。这种卓越性能的原因在于其能够准确呈现数据中的非线性关联和相互作用,而这对于逻辑回归等更传统的模型来说是困难的。女性的年龄、是否拥有智能手机、婚姻状况、就医次数、艾滋病毒检测、子女数量、医护人员检查、财富状况、居住地点、母亲职业、教育水平、社交媒体使用情况、健康状况以及距离医疗机构的远近是乳房自我检查的预测因素。总之,决策树是表现最佳的模型,AUC和准确率分别为87%和82%,这是因为它能够捕捉预测变量与目标变量之间的非线性关系,使用集成平均和随机特征选择来减少方差和过拟合,以及其固有的特征重要性机制使其对无关特征具有鲁棒性。基于本研究结果,为提高乳房自我检查(BSE)的意识,我们建议,向社区领袖宣传乳腺癌及自我检查的益处,部署移动健康诊所和外展项目,培训健康推广工作者正确的乳房自我检查方法以便与社区分享,此外,用当地语言开展广播/电视宣传活动以向广大受众传播信息。