Department of Computer Science and Engineering, Rangamati Science and Technology University, Vedvedi, Rangamati, Bangladesh.
Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Tangail, 1902, Bangladesh.
BMC Bioinformatics. 2021 Apr 24;22(1):213. doi: 10.1186/s12859-021-04131-6.
In this research, an astute system has been developed by using machine learning and data mining approach to predict the risk level of cervical and ovarian cancer in association to stress.
For functioning factors and subfactors, several machine learning models like Logistics Regression, Random Forest, AdaBoost, Naïve Bayes, Neural Network, kNN, CN2 rule Inducer, Decision Tree, Quadratic Classifier were compared with standard metrics e.g., F1, AUC, CA. For certainty info gain, gain ratio, gini index were revealed for both cervical and ovarian cancer. Attributes were ranked using different feature selection evaluators. Then the most significant analysis was made with the significant factors. Factors like children, age of first intercourse, age of husband, Pap test, age are the most significant factors of cervical cancer. On the other hand, genital area infection, pregnancy problems, use of drugs, abortion, and the number of children are important factors of ovarian cancer.
Resulting factors were merged, categorized, weighted according to their significance level. The categorized factors were indexed using ranker algorithm which provides them a weightage value. An algorithm has been formulated afterward which can be used to predict the risk level of cervical and ovarian cancer in relation to women's mental health. The research will have a great impact on the low incoming country like Bangladesh as most women in low incoming nations were unaware of it. As these two can be described as the most sensitive cancers to women, the development of the application from algorithm will also help to reduce women's mental stress. More data and parameters will be added in future for research in this perspective.
在这项研究中,我们使用机器学习和数据挖掘方法开发了一个敏锐的系统,以预测与压力相关的宫颈癌和卵巢癌的风险水平。
对于功能因素和子因素,我们比较了几种机器学习模型,如逻辑回归、随机森林、自适应增强、朴素贝叶斯、神经网络、kNN、CN2 规则诱导器、决策树、二次分类器,以及标准指标,如 F1、AUC、CA。对于确定性信息增益、增益比、基尼指数,我们揭示了宫颈癌和卵巢癌的信息增益、增益比、基尼指数。使用不同的特征选择评估器对属性进行了排序。然后,我们对重要因素进行了最显著的分析。像孩子、初次性交年龄、丈夫年龄、巴氏涂片检查、年龄等因素是宫颈癌的最重要因素。另一方面,生殖区域感染、怀孕问题、药物使用、流产和孩子数量是卵巢癌的重要因素。
将相关因素进行合并、分类、根据其重要性水平进行加权。使用排名算法对分类因素进行索引,为其提供权重值。随后制定了一个算法,可以用来预测与妇女心理健康有关的宫颈癌和卵巢癌的风险水平。这项研究将对像孟加拉国这样的低收入国家产生重大影响,因为大多数低收入国家的妇女都没有意识到这一点。由于这两种癌症可以说是对妇女最敏感的癌症,因此该应用程序的开发也将有助于减轻妇女的精神压力。未来将在这一研究方向上增加更多的数据和参数。