Alam Suha Sayma, Islam Muhammad Nazrul
Department of Computer Science and Engineering, Military Institute of Science and Technology, Dhaka, Bangladesh.
Heliyon. 2023 Mar 16;9(3):e14518. doi: 10.1016/j.heliyon.2023.e14518. eCollection 2023 Mar.
Polycystic ovary syndrome (PCOS) is the most frequent endocrinological anomaly in reproductive women that causes persistent hormonal secretion disruption, leading to the formation of numerous cysts within the ovaries and serious health complications. But the real-world clinical detection technique for PCOS is very critical since the accuracy of interpretations being substantially dependent on the physician's expertise. Thus, an artificially intelligent PCOS prediction model might be a feasible additional technique to the error prone and time-consuming diagnostic technique. In this study, a modified ensemble machine learning (ML) classification approach is proposed utilizing state-of-the-art stacking technique for PCOS identification with patients' symptom data; employing five traditional ML models as base learners and then one bagging or boosting ensemble ML model as the meta-learner of the stacked model. Furthermore, three distinct types of feature selection strategies are applied to pick different sets of features with varied numbers and combinations of attributes. To evaluate and explore the dominant features necessary for predicting PCOS, the proposed technique with five variety of models and other ten types of classifiers is trained, tested and assessed utilizing different feature sets. As outcomes, the proposed stacking ensemble technique significantly enhances the accuracy in comparison to the other existing ML based techniques in case of all varieties of feature sets. However, among various models investigated to categorize PCOS and non-PCOS patients, the stacking ensemble model with 'Gradient Boosting' classifier as meta learner outperforms others with 95.7% accuracy while utilizing the top 25 features selected using Principal Component Analysis (PCA) feature selection technique.
多囊卵巢综合征(PCOS)是育龄女性中最常见的内分泌异常疾病,会导致激素分泌持续紊乱,进而在卵巢内形成大量囊肿并引发严重的健康并发症。但是,PCOS的实际临床检测技术至关重要,因为解读的准确性在很大程度上取决于医生的专业知识。因此,人工智能PCOS预测模型可能是一种可行的补充技术,可以弥补容易出错且耗时的诊断技术的不足。在本研究中,提出了一种改进的集成机器学习(ML)分类方法,利用先进的堆叠技术,根据患者的症状数据进行PCOS识别;使用五个传统ML模型作为基学习器,然后使用一个装袋或提升集成ML模型作为堆叠模型的元学习器。此外,应用了三种不同类型的特征选择策略,以挑选具有不同数量和属性组合的不同特征集。为了评估和探索预测PCOS所需的主导特征,使用不同的特征集对所提出的技术与五种模型以及其他十种分类器进行训练、测试和评估。结果表明,在所提出的堆叠集成技术在所有类型的特征集情况下,与其他现有的基于ML的技术相比,显著提高了准确性。然而,在用于对PCOS和非PCOS患者进行分类的各种模型中,以“梯度提升”分类器作为元学习器的堆叠集成模型在使用主成分分析(PCA)特征选择技术选择的前25个特征时,以95.7%的准确率优于其他模型。