Hassan Ayesha, Ramzan Shabana, Raza Ali, Munwar Iqbal Muhammad, Smerat Aseel, Latif Fitriyani Norma, Syafrudin Muhammad, Won Lee Seung
Department of Computer Science & IT, Government Sadiq College Women University Bahawalpur, Punjab, Pakistan.
Department of Software Engineering, The University of Lahore, Lahore, Punjab, Pakistan.
Digit Health. 2025 Jun 6;11:20552076251341430. doi: 10.1177/20552076251341430. eCollection 2025 Jan-Dec.
Hypothyroidism, hyperthyroidism, thyroid nodules, and other thyroid disorders are common around the world, affect millions of people worldwide, and untreated health conditions may lead to serious health issues. An accurate and timely diagnosis serves as crucial for proper management and medication. This study utilizes a dataset from the UCI machine-learning repository to put forward the comprehensive machine-learning technique for diagnosing thyroid disorders.
The proposed methodology involved exploratory data analysis and preparation, which included handling missing values, encoding categorical values, and selecting features. The synthetic minority over-sampling technique technique is utilized to overcome the problem of class imbalance. Five advanced machine learning (ML) algorithms, logistic regression, support vector machine, decision tree, random forest, and gradient boosting are employed to develop predictive models. Further, an innovative stacking ensemble method is proposed with the help of four applied models. The results from these models are aggregated, and logistic regression serves as a meta-learner.
A 10-fold cross-validation technique is utilized to ensure robust model evaluation and reduce the risk of overfitting by using one test set for each subset and training on the rest of the subsets. The ensemble model attained an accuracy of 99.86%, outperforming individual models.
These results reveal the capability of ML, especially ensemble approaches, to enhance accurate and timely diagnosis of thyroid disorders.
甲状腺功能减退、甲状腺功能亢进、甲状腺结节及其他甲状腺疾病在全球范围内普遍存在,影响着全球数百万人,未经治疗的健康状况可能导致严重的健康问题。准确及时的诊断对于正确管理和用药至关重要。本研究利用来自加州大学欧文分校机器学习存储库的数据集,提出用于诊断甲状腺疾病的综合机器学习技术。
所提出的方法包括探索性数据分析和准备,其中包括处理缺失值、对分类值进行编码以及选择特征。利用合成少数过采样技术来克服类不平衡问题。采用逻辑回归、支持向量机、决策树、随机森林和梯度提升这五种先进的机器学习(ML)算法来开发预测模型。此外,借助四个应用模型提出了一种创新的堆叠集成方法。对这些模型的结果进行汇总,逻辑回归作为元学习器。
采用10折交叉验证技术,通过对每个子集使用一个测试集并在其余子集上进行训练,确保稳健的模型评估并降低过拟合风险。集成模型的准确率达到99.86%,优于单个模型。
这些结果揭示了机器学习,尤其是集成方法,在增强甲状腺疾病准确及时诊断方面的能力。