Chaganti Rajasekhar, Rustam Furqan, De La Torre Díez Isabel, Mazón Juan Luis Vidal, Rodríguez Carmen Lili, Ashraf Imran
Toyota Research Institute, Los Altos, CA 94022, USA.
Department of Software Engineering, School of System Sciences, University of Management and Technology, Lahore 54770, Pakistan.
Cancers (Basel). 2022 Aug 13;14(16):3914. doi: 10.3390/cancers14163914.
Thyroid disease prediction has emerged as an important task recently. Despite existing approaches for its diagnosis, often the target is binary classification, the used datasets are small-sized and results are not validated either. Predominantly, existing approaches focus on model optimization and the feature engineering part is less investigated. To overcome these limitations, this study presents an approach that investigates feature engineering for machine learning and deep learning models. Forward feature selection, backward feature elimination, bidirectional feature elimination, and machine learning-based feature selection using extra tree classifiers are adopted. The proposed approach can predict Hashimoto's thyroiditis (primary hypothyroid), binding protein (increased binding protein), autoimmune thyroiditis (compensated hypothyroid), and non-thyroidal syndrome (NTIS) (concurrent non-thyroidal illness). Extensive experiments show that the extra tree classifier-based selected feature yields the best results with 0.99 accuracy and an F1 score when used with the random forest classifier. Results suggest that the machine learning models are a better choice for thyroid disease detection regarding the provided accuracy and the computational complexity. K-fold cross-validation and performance comparison with existing studies corroborate the superior performance of the proposed approach.
甲状腺疾病预测近来已成为一项重要任务。尽管现有甲状腺疾病诊断方法,但通常目标是二分类,所使用的数据集规模较小且结果也未得到验证。现有方法主要侧重于模型优化,而对特征工程部分的研究较少。为克服这些局限性,本研究提出一种针对机器学习和深度学习模型研究特征工程的方法。采用了前向特征选择、后向特征消除、双向特征消除以及使用极端随机树分类器进行基于机器学习的特征选择。所提出的方法能够预测桥本甲状腺炎(原发性甲状腺功能减退)、结合蛋白(结合蛋白增加)、自身免疫性甲状腺炎(代偿性甲状腺功能减退)以及非甲状腺综合征(NTIS)(并发非甲状腺疾病)。大量实验表明,基于极端随机树分类器选择的特征与随机森林分类器一起使用时,能产生最佳结果,准确率达0.99,F1分数也很高。结果表明,就所提供的准确率和计算复杂度而言,机器学习模型是甲状腺疾病检测的更好选择。K折交叉验证以及与现有研究的性能比较证实了所提方法的卓越性能。