Suppr超能文献

基于算法适用性知识库的C4.5算法参数选择与优化研究

Research on parameter selection and optimization of C4.5 algorithm based on algorithm applicability knowledge base.

作者信息

Zhang Yiyan, Xin Yi, Li Qin

机构信息

Department of Electronic and Information Engineering, School of Intelligent Manufacturing, Qingdao Huanghai University, Qingdao, China.

Department of Biomedical Engineering, School of Medical Technology, Beijing Institute of Technology, Beijing, China.

出版信息

Sci Rep. 2025 Aug 11;15(1):29418. doi: 10.1038/s41598-025-11901-2.

Abstract

Given that the decision tree C4.5 algorithm has outstanding performance in prediction accuracy on medical datasets and is highly interpretable, this paper carries out an optimization study on the selection of hyperparameters of the algorithm in order to achieve fast and accurate optimization of the algorithm model. The decision tree models are first constructed by taking different values of hyperparameters, and then the performance of each model is evaluated, and then the evaluated data are associated and integrated with the character metadata of the dataset. Three evaluation values of accuracy, AUC and F1-measure and 293 basic data sets were used to build a meta-database of hyperparameter M optimization required by the study. And then the range of values of C4.5 algorithm hyperparameters corresponding to the different character datasets are recommended through the modeling learning. The results show that for more than 65% of the data sets, there is no need to tune the hyperparameter M, which can avoid the waste of time caused by unnecessary tuning. The accuracy rate of the hyperparameter optimization value judgment model obtained in this study can reach more than 80%. The test and evaluation results verify the feasibility of the optimized hyperparameter value recommendation, which provides an important basis for the fast tuning and optimization of the C4.5 algorithm parameters.

摘要

鉴于决策树C4.5算法在医学数据集的预测准确性方面表现出色且具有高度可解释性,本文对该算法的超参数选择进行了优化研究,以实现算法模型的快速准确优化。首先通过采用不同的超参数值构建决策树模型,然后评估每个模型的性能,接着将评估后的数据与数据集的特征元数据进行关联和整合。使用准确率、AUC和F1值这三个评估值以及293个基础数据集构建了本研究所需的超参数M优化元数据库。然后通过建模学习推荐不同特征数据集对应的C4.5算法超参数值范围。结果表明,对于超过65%的数据集,无需调整超参数M,这可以避免不必要调整所造成的时间浪费。本研究得到的超参数优化值判断模型的准确率可达80%以上。测试和评估结果验证了优化后的超参数值推荐的可行性,为C4.5算法参数的快速调整和优化提供了重要依据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4761/12340013/5c0bd0af5ee5/41598_2025_11901_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验