应用机器学习算法预测甲状腺疾病风险：一项实验性比较研究。

Application of machine learning algorithms to predict the thyroid disease risk: an experimental comparative study.

作者信息

Islam Saima Sharleen, Haque Md Samiul, Miah M Saef Ullah, Sarwar Talha Bin, Nugraha Ramdhan

机构信息

Department of Computer Science, Faculty of Science and Technology, American International University - Bangladesh (AIUB), Dhaka, Bangladesh.

Faculty of Computing, College of Computing and Applied Sciences, Universiti Malaysia Pahang, Pekan, Pahang, Malaysia.

出版信息

PeerJ Comput Sci. 2022 Mar 3;8:e898. doi: 10.7717/peerj-cs.898. eCollection 2022.

DOI:10.7717/peerj-cs.898

PMID:35494828

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9044232/

Abstract

Thyroid disease is the general concept for a medical problem that prevents one's thyroid from producing enough hormones. Thyroid disease can affect everyone-men, women, children, adolescents, and the elderly. Thyroid disorders are detected by blood tests, which are notoriously difficult to interpret due to the enormous amount of data necessary to forecast results. For this reason, this study compares eleven machine learning algorithms to determine which one produces the best accuracy for predicting thyroid risk accurately. This study utilizes the Sick-euthyroid dataset, acquired from the University of California, Irvine's machine learning repository, for this purpose. Since the target variable classes in this dataset are mostly one, the accuracy score does not accurately indicate the prediction outcome. Thus, the evaluation metric contains accuracy and recall ratings. Additionally, the F1-score produces a single value that balances the precision and recall when an uneven distribution class exists. Finally, the F1-score is utilized to evaluate the performance of the employed machine learning algorithms as it is one of the most effective output measurements for unbalanced classification problems. The experiment shows that the ANN Classifier with an F1-score of 0.957 outperforms the other nine algorithms in terms of accuracy.

摘要

甲状腺疾病是一个医学问题的统称，指甲状腺无法产生足够的激素。甲状腺疾病可影响所有人，包括男性、女性、儿童、青少年和老年人。甲状腺疾病通过血液检测来诊断，由于预测结果需要大量数据，这些检测结果 notoriously difficult to interpret（难以解读）。因此，本研究比较了11种机器学习算法，以确定哪种算法在准确预测甲状腺风险方面具有最高的准确率。本研究为此使用了从加利福尼亚大学欧文分校机器学习库获取的甲状腺疾病数据集。由于该数据集中的目标变量类别大多为单一类别，准确率得分并不能准确表明预测结果。因此，评估指标包括准确率和召回率评级。此外，当存在不均衡分布类别时，F1分数会产生一个平衡精确率和召回率的单一值。最后，F1分数被用来评估所采用的机器学习算法的性能，因为它是不平衡分类问题最有效的输出度量之一。实验表明，F1分数为0.957的人工神经网络分类器在准确率方面优于其他九种算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0682/9044232/879a628768c2/peerj-cs-08-898-g001.jpg

相似文献

Application of machine learning algorithms to predict the thyroid disease risk: an experimental comparative study.应用机器学习算法预测甲状腺疾病风险：一项实验性比较研究。

PeerJ Comput Sci. 2022 Mar 3;8:e898. doi: 10.7717/peerj-cs.898. eCollection 2022.

SSC: The novel self-stack ensemble model for thyroid disease prediction.SSC：用于甲状腺疾病预测的新型自堆叠集成模型。

PLoS One. 2024 Jan 3;19(1):e0295501. doi: 10.1371/journal.pone.0295501. eCollection 2024.

Predictive Modeling for Frailty Conditions in Elderly People: Machine Learning Approaches.老年人衰弱状况的预测建模：机器学习方法

JMIR Med Inform. 2020 Jun 4;8(6):e16678. doi: 10.2196/16678.

Comparison of supervised machine learning classification techniques in prediction of locoregional recurrences in early oral tongue cancer.比较早期口腔舌癌局部区域复发预测中监督机器学习分类技术。

Int J Med Inform. 2020 Apr;136:104068. doi: 10.1016/j.ijmedinf.2019.104068. Epub 2019 Dec 28.

Optimised stacked machine learning algorithms for genomics and genetics disorder detection in the healthcare industry.优化的堆叠机器学习算法在医疗保健行业中的基因组学和遗传疾病检测。

Funct Integr Genomics. 2024 Feb 2;24(1):23. doi: 10.1007/s10142-024-01289-z.

A Comparative Study of Automated Machine Learning Platforms for Exercise Anthropometry-Based Typology Analysis: Performance Evaluation of AWS SageMaker, GCP VertexAI, and MS Azure.基于运动人体测量学类型分析的自动化机器学习平台比较研究：亚马逊云科技SageMaker、谷歌云平台VertexAI和微软Azure的性能评估

Bioengineering (Basel). 2023 Jul 27;10(8):891. doi: 10.3390/bioengineering10080891.

A new hybrid ensemble machine-learning model for severity risk assessment and post-COVID prediction system.一种新的混合集成机器学习模型，用于严重程度风险评估和 COVID 后预测系统。

Math Biosci Eng. 2022 Apr 13;19(6):6102-6123. doi: 10.3934/mbe.2022285.

Exploiting Machine Learning Algorithms and Methods for the Prediction of Agitated Delirium After Cardiac Surgery: Models Development and Validation Study.利用机器学习算法和方法预测心脏手术后的激越性谵妄：模型开发与验证研究

JMIR Med Inform. 2019 Oct 23;7(4):e14993. doi: 10.2196/14993.

Performance of a Machine Learning Classifier of Knee MRI Reports in Two Large Academic Radiology Practices: A Tool to Estimate Diagnostic Yield.在两家大型学术放射科实践中膝关节MRI报告的机器学习分类器性能：一种估计诊断率的工具

AJR Am J Roentgenol. 2017 Apr;208(4):750-753. doi: 10.2214/AJR.16.16128. Epub 2017 Jan 31.

Soft Clustering for Enhancing the Diagnosis of Chronic Diseases over Machine Learning Algorithms.基于机器学习算法的软聚类在慢性病诊断中的应用。

J Healthc Eng. 2020 Mar 9;2020:4984967. doi: 10.1155/2020/4984967. eCollection 2020.

引用本文的文献

REDf: a deep learning model for short-term load forecasting to facilitate renewable integration and attaining the SDGs 7, 9, and 13.REDf：一种用于短期负荷预测的深度学习模型，以促进可再生能源整合并实现可持续发展目标7、9和13。

PeerJ Comput Sci. 2025 Apr 23;11:e2819. doi: 10.7717/peerj-cs.2819. eCollection 2025.

Comprehensive framework for thyroid disorder diagnosis: Integrating advanced feature selection, genetic algorithms, and machine learning for enhanced accuracy and other performance matrices.甲状腺疾病诊断的综合框架：整合先进的特征选择、遗传算法和机器学习以提高准确性及其他性能指标。

PLoS One. 2025 Jun 18;20(6):e0325900. doi: 10.1371/journal.pone.0325900. eCollection 2025.

Enhancing breast cancer prediction through stacking ensemble and deep learning integration.通过堆叠集成和深度学习集成增强乳腺癌预测

PeerJ Comput Sci. 2025 Feb 3;11:e2461. doi: 10.7717/peerj-cs.2461. eCollection 2025.

Advanced Brain Tumor Classification in MR Images Using Transfer Learning and Pre-Trained Deep CNN Models.利用迁移学习和预训练深度卷积神经网络模型对磁共振图像中的高级脑肿瘤进行分类

Cancers (Basel). 2025 Jan 2;17(1):121. doi: 10.3390/cancers17010121.

Synthetic Boosted Resampling Using Deep Generative Adversarial Networks: A Novel Approach to Improve Cancer Prediction from Imbalanced Datasets.使用深度生成对抗网络的合成增强重采样：一种从不平衡数据集中改善癌症预测的新方法。

Cancers (Basel). 2024 Dec 2;16(23):4046. doi: 10.3390/cancers16234046.

Machine learning based tuberculosis (ML-TB) health predictor model: early TB health disease prediction with ML models for prevention in developing countries.基于机器学习的结核病（ML-TB）健康预测模型：利用机器学习模型对发展中国家的早期结核病健康疾病进行预测以实现预防。

PeerJ Comput Sci. 2024 Oct 16;10:e2397. doi: 10.7717/peerj-cs.2397. eCollection 2024.

Enhanced interpretable thyroid disease diagnosis by leveraging synthetic oversampling and machine learning models.利用合成过采样和机器学习模型增强甲状腺疾病的可解释诊断。

BMC Med Inform Decis Mak. 2024 Nov 29;24(1):364. doi: 10.1186/s12911-024-02780-0.

Learning from Imbalanced Data: Integration of Advanced Resampling Techniques and Machine Learning Models for Enhanced Cancer Diagnosis and Prognosis.从不平衡数据中学习：先进重采样技术与机器学习模型的整合用于增强癌症诊断与预后

Cancers (Basel). 2024 Oct 8;16(19):3417. doi: 10.3390/cancers16193417.

Analysis and interpretability of machine learning models to classify thyroid disease.甲状腺疾病分类的机器学习模型分析与可解释性

PLoS One. 2024 May 31;19(5):e0300670. doi: 10.1371/journal.pone.0300670. eCollection 2024.

Deep learning and content-based filtering techniques for improving plant disease identification and treatment recommendations: A comprehensive review.用于改进植物病害识别和治疗建议的深度学习与基于内容的过滤技术：全面综述

Heliyon. 2024 Apr 16;10(9):e29583. doi: 10.1016/j.heliyon.2024.e29583. eCollection 2024 May 15.

本文引用的文献

Laboratory interference in the thyroid function test.实验室干扰甲状腺功能检测。

Endokrynol Pol. 2020;71(6):551-560. doi: 10.5603/EP.a2020.0079.

Automated Segmentation of Thyroid Nodule, Gland, and Cystic Components From Ultrasound Images Using Deep Learning.使用深度学习从超声图像中自动分割甲状腺结节、腺体和囊性成分

IEEE Access. 2020;8:63482-63496. doi: 10.1109/access.2020.2982390. Epub 2020 Mar 23.

Role of thyroid hormones in craniofacial development.甲状腺激素在颅面发育中的作用。

Nat Rev Endocrinol. 2020 Mar;16(3):147-164. doi: 10.1038/s41574-019-0304-5. Epub 2020 Jan 23.

An Overview of the Thyroid Gland and Thyroid-Related Deaths for the Forensic Pathologist.法医病理学家的甲状腺及甲状腺相关死亡概述

Acad Forensic Pathol. 2016 Jun;6(2):217-236. doi: 10.23907/2016.024. Epub 2016 Jun 1.

Global epidemiology of hyperthyroidism and hypothyroidism.全球甲状腺功能亢进症和甲状腺功能减退症的流行病学。

Nat Rev Endocrinol. 2018 May;14(5):301-316. doi: 10.1038/nrendo.2018.18. Epub 2018 Mar 23.

Decision tree methods: applications for classification and prediction.决策树方法：分类与预测应用

Shanghai Arch Psychiatry. 2015 Apr 25;27(2):130-5. doi: 10.11919/j.issn.1002-0829.215044.

Diagnosis and treatment of patients with thyroid cancer.甲状腺癌患者的诊断与治疗。

Am Health Drug Benefits. 2015 Feb;8(1):30-40.

Random Forest ensembles for detection and prediction of Alzheimer's disease with a good between-cohort robustness.用于阿尔茨海默病检测和预测的随机森林集成模型，具有良好的队列间稳健性。

Neuroimage Clin. 2014 Aug 28;6:115-25. doi: 10.1016/j.nicl.2014.08.023. eCollection 2014.

Fuzzy and hard clustering analysis for thyroid disease.甲状腺疾病的模糊和硬聚类分析。

Comput Methods Programs Biomed. 2013 Jul;111(1):1-16. doi: 10.1016/j.cmpb.2013.01.002. Epub 2013 Jan 26.

Exploratory undersampling for class-imbalance learning.用于类别不平衡学习的探索性欠采样

IEEE Trans Syst Man Cybern B Cybern. 2009 Apr;39(2):539-50. doi: 10.1109/TSMCB.2008.2007853. Epub 2008 Dec 16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验