Islam Saima Sharleen, Haque Md Samiul, Miah M Saef Ullah, Sarwar Talha Bin, Nugraha Ramdhan
Department of Computer Science, Faculty of Science and Technology, American International University - Bangladesh (AIUB), Dhaka, Bangladesh.
Faculty of Computing, College of Computing and Applied Sciences, Universiti Malaysia Pahang, Pekan, Pahang, Malaysia.
PeerJ Comput Sci. 2022 Mar 3;8:e898. doi: 10.7717/peerj-cs.898. eCollection 2022.
Thyroid disease is the general concept for a medical problem that prevents one's thyroid from producing enough hormones. Thyroid disease can affect everyone-men, women, children, adolescents, and the elderly. Thyroid disorders are detected by blood tests, which are notoriously difficult to interpret due to the enormous amount of data necessary to forecast results. For this reason, this study compares eleven machine learning algorithms to determine which one produces the best accuracy for predicting thyroid risk accurately. This study utilizes the Sick-euthyroid dataset, acquired from the University of California, Irvine's machine learning repository, for this purpose. Since the target variable classes in this dataset are mostly one, the accuracy score does not accurately indicate the prediction outcome. Thus, the evaluation metric contains accuracy and recall ratings. Additionally, the F1-score produces a single value that balances the precision and recall when an uneven distribution class exists. Finally, the F1-score is utilized to evaluate the performance of the employed machine learning algorithms as it is one of the most effective output measurements for unbalanced classification problems. The experiment shows that the ANN Classifier with an F1-score of 0.957 outperforms the other nine algorithms in terms of accuracy.
甲状腺疾病是一个医学问题的统称,指甲状腺无法产生足够的激素。甲状腺疾病可影响所有人,包括男性、女性、儿童、青少年和老年人。甲状腺疾病通过血液检测来诊断,由于预测结果需要大量数据,这些检测结果 notoriously difficult to interpret(难以解读)。因此,本研究比较了11种机器学习算法,以确定哪种算法在准确预测甲状腺风险方面具有最高的准确率。本研究为此使用了从加利福尼亚大学欧文分校机器学习库获取的甲状腺疾病数据集。由于该数据集中的目标变量类别大多为单一类别,准确率得分并不能准确表明预测结果。因此,评估指标包括准确率和召回率评级。此外,当存在不均衡分布类别时,F1分数会产生一个平衡精确率和召回率的单一值。最后,F1分数被用来评估所采用的机器学习算法的性能,因为它是不平衡分类问题最有效的输出度量之一。实验表明,F1分数为0.957的人工神经网络分类器在准确率方面优于其他九种算法。