Department of Computer Science, Lagos State University, Nigeria.
Department of Computer Science, Lagos State University, Nigeria.
Cancer Treat Res Commun. 2021;28:100396. doi: 10.1016/j.ctarc.2021.100396. Epub 2021 May 15.
One of the most important steps in combating breast cancer is early and accurate diagnosis. Unfortunately, breast cancer is asymptomatic at the early stage, although some symptoms are presented at a later time, but at symptomatic stage treatment could be complicated or even become impossible thereby leading to death. Proper risk assessment is hence very important in reducing mortality. Some computational techniques have been developed for breast cancer risk assessment in the developed world, but such techniques do not work well in Africa because of the difference in risk profiles of African women e.g. later menarche, low drug abuse and low smoking rate.
In this work, we propose a bespoke risk prediction model for African women using Random Forest Classifier (RFC) machine learning technique.
A total of 180 subjects were studied out of which 90 were confirmed cases of breast cancer and 90 were benign. Twenty-five risk factors were included, for example, smoking, alcohol intake, occupational hazards and age at menopause. Four approaches were empirically used in the feature selection, these are the use of Chi-Square, mutual information gain, Spearman correlation and the entire features. RFC algorithm was used to develop the prediction model.
We found that family history of breast cancer, dense breast, deliberate abortion, age at first child, fruit intake and regular exercise are predictors of breast cancer. The RFC model gave an accuracy of 91.67%, sensitivity of 87.10%, specificity of 96.55% and Area under curve (AUC) of 92% when all the risk factors were included in the model while an accuracy of 96.67%, sensitivity of 93.75%, specificity of 100% and AUC of 97% were obtained when correlation-selected features were included in the model. The Chi-Square selected features gave the best performance with 98.33% accuracy, 100% sensitivity, 96.55 specificity and 98% AUC. Mutual information gain selected feature gave the same results as Chi-Square selected features.
Random Forest Classifier has a good potential at predicting the risk of breast cancer in African women. The study helped to identify the risk factors of breast cancer in African women. This is a valuable information which can help African women to pay attention to those risk factors with the intention of reducing the incidence of breast cancer in Africa.
乳腺癌防治最重要的措施之一是早期、准确诊断。遗憾的是,乳腺癌在早期阶段无症状,尽管一些症状在后期出现,但在出现症状时,治疗可能会变得复杂,甚至无法进行,从而导致死亡。因此,适当的风险评估对于降低死亡率非常重要。一些计算技术已经被开发出来用于发达国家的乳腺癌风险评估,但由于非洲女性的风险特征不同,例如月经初潮较晚、药物滥用和吸烟率较低等,这些技术在非洲效果不佳。
本研究采用随机森林分类器(Random Forest Classifier,RFC)机器学习技术,为非洲女性建立一种定制的风险预测模型。
共纳入 180 名受试者,其中 90 名为乳腺癌确诊病例,90 名为良性病例。纳入了 25 个风险因素,例如吸烟、饮酒、职业危害和绝经年龄等。我们采用了卡方检验、互信息增益、斯皮尔曼相关系数和全部特征四种方法进行特征选择,使用 RFC 算法建立预测模型。
我们发现乳腺癌家族史、致密乳腺、故意流产、初产年龄、水果摄入和规律运动是乳腺癌的预测因素。当模型中包含所有风险因素时,RFC 模型的准确性为 91.67%,灵敏度为 87.10%,特异性为 96.55%,曲线下面积(area under the curve,AUC)为 92%;当模型中包含相关选择特征时,准确性为 96.67%,灵敏度为 93.75%,特异性为 100%,AUC 为 97%。卡方检验选择特征的表现最佳,准确性为 98.33%,灵敏度为 100%,特异性为 96.55%,AUC 为 98%。互信息增益选择特征的结果与卡方检验选择特征相同。
随机森林分类器在预测非洲女性乳腺癌风险方面具有良好的潜力。本研究有助于确定非洲女性乳腺癌的风险因素。这是一个有价值的信息,可以帮助非洲女性关注这些风险因素,以减少非洲乳腺癌的发病率。