• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

证实基于树的机器学习算法在表格数据方面相对于其对应算法具有统计学上的显著优势。

Confirming the statistically significant superiority of tree-based machine learning algorithms over their counterparts for tabular data.

机构信息

School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, NSW, Australia.

出版信息

PLoS One. 2024 Apr 18;19(4):e0301541. doi: 10.1371/journal.pone.0301541. eCollection 2024.

DOI:10.1371/journal.pone.0301541
PMID:38635591
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11025817/
Abstract

Many individual studies in the literature observed the superiority of tree-based machine learning (ML) algorithms. However, the current body of literature lacks statistical validation of this superiority. This study addresses this gap by employing five ML algorithms on 200 open-access datasets from a wide range of research contexts to statistically confirm the superiority of tree-based ML algorithms over their counterparts. Specifically, it examines two tree-based ML (Decision tree and Random forest) and three non-tree-based ML (Support vector machine, Logistic regression and k-nearest neighbour) algorithms. Results from paired-sample t-tests show that both tree-based ML algorithms reveal better performance than each non-tree-based ML algorithm for the four ML performance measures (accuracy, precision, recall and F1 score) considered in this study, each at p<0.001 significance level. This performance superiority is consistent across both the model development and test phases. This study also used paired-sample t-tests for the subsets of the research datasets from disease prediction (66) and university-ranking (50) research contexts for further validation. The observed superiority of the tree-based ML algorithms remains valid for these subsets. Tree-based ML algorithms significantly outperformed non-tree-based algorithms for these two research contexts for all four performance measures. We discuss the research implications of these findings in detail in this article.

摘要

许多文献中的个别研究都观察到基于树的机器学习 (ML) 算法的优越性。然而,目前的文献缺乏对此优越性的统计学验证。本研究通过在来自广泛研究背景的 200 个开放访问数据集上使用五种 ML 算法,来统计确认基于树的 ML 算法相对于其对应算法的优越性,从而填补了这一空白。具体来说,它检查了两种基于树的 ML(决策树和随机森林)和三种非基于树的 ML(支持向量机、逻辑回归和 K 最近邻)算法。配对样本 t 检验的结果表明,对于本研究考虑的四个 ML 性能指标(准确性、精度、召回率和 F1 得分),这两种基于树的 ML 算法都比每种非基于树的 ML 算法表现更好,每个指标在 p<0.001 的显著性水平上。这种性能优势在模型开发和测试阶段都保持一致。本研究还对疾病预测(66 个)和大学排名(50 个)研究背景的研究数据集子集使用配对样本 t 检验进行了进一步验证。对于这两个研究领域,基于树的 ML 算法的优越性仍然有效。对于所有四个性能指标,基于树的 ML 算法都显著优于非基于树的算法。我们在本文中详细讨论了这些发现的研究意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2da1/11025817/c8ae7cab9769/pone.0301541.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2da1/11025817/1bc70832a69a/pone.0301541.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2da1/11025817/c8ae7cab9769/pone.0301541.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2da1/11025817/1bc70832a69a/pone.0301541.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2da1/11025817/c8ae7cab9769/pone.0301541.g002.jpg

相似文献

1
Confirming the statistically significant superiority of tree-based machine learning algorithms over their counterparts for tabular data.证实基于树的机器学习算法在表格数据方面相对于其对应算法具有统计学上的显著优势。
PLoS One. 2024 Apr 18;19(4):e0301541. doi: 10.1371/journal.pone.0301541. eCollection 2024.
2
Comparison of supervised machine learning classification techniques in prediction of locoregional recurrences in early oral tongue cancer.比较早期口腔舌癌局部区域复发预测中监督机器学习分类技术。
Int J Med Inform. 2020 Apr;136:104068. doi: 10.1016/j.ijmedinf.2019.104068. Epub 2019 Dec 28.
3
Dataset meta-level and statistical features affect machine learning performance.数据集的元级别和统计特征会影响机器学习性能。
Sci Rep. 2024 Jan 19;14(1):1670. doi: 10.1038/s41598-024-51825-x.
4
Machine Learning Hybrid Model for the Prediction of Chronic Kidney Disease.机器学习混合模型预测慢性肾脏病。
Comput Intell Neurosci. 2023 Mar 14;2023:9266889. doi: 10.1155/2023/9266889. eCollection 2023.
5
Comparison of Machine Learning Algorithms in the Prediction of Hospitalized Patients with Schizophrenia.机器学习算法在预测住院精神分裂症患者中的比较。
Sensors (Basel). 2022 Mar 25;22(7):2517. doi: 10.3390/s22072517.
6
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
7
Comparison of Machine Learning Algorithms for Heartbeat Detection Based on Accelerometric Signals Produced by a Smart Bed.基于智能床产生的加速度信号的心跳检测的机器学习算法比较。
Sensors (Basel). 2024 Mar 15;24(6):1900. doi: 10.3390/s24061900.
8
Comparative analysis of weka-based classification algorithms on medical diagnosis datasets.基于 WEKA 的分类算法在医学诊断数据集上的比较分析。
Technol Health Care. 2023;31(S1):397-408. doi: 10.3233/THC-236034.
9
Application of supervised machine learning algorithms for classification and prediction of type-2 diabetes disease status in Afar regional state, Northeastern Ethiopia 2021.2021 年,埃塞俄比亚东北部阿法尔地区使用监督机器学习算法对 2 型糖尿病疾病状况进行分类和预测。
Sci Rep. 2023 May 13;13(1):7779. doi: 10.1038/s41598-023-34906-1.
10
Machine learning algorithms to predict early pregnancy loss after in vitro fertilization-embryo transfer with fetal heart rate as a strong predictor.以胎儿心率作为强预测指标,用于预测体外受精-胚胎移植后早期妊娠丢失的机器学习算法。
Comput Methods Programs Biomed. 2020 Nov;196:105624. doi: 10.1016/j.cmpb.2020.105624. Epub 2020 Jun 25.

引用本文的文献

1
The efficacy of machine learning algorithms in evaluating factors associated with shunt-dependent hydrocephalus after subarachnoid hemorrhage: a systematic review and meta-analysis.机器学习算法在评估蛛网膜下腔出血后与分流依赖型脑积水相关因素中的疗效:一项系统评价和荟萃分析
Neurosurg Rev. 2025 Sep 1;48(1):629. doi: 10.1007/s10143-025-03773-x.
2
Explainable AI reveals tissue pathology and psychosocial drivers of opioid prescription for non-specific chronic low back pain.可解释人工智能揭示了非特异性慢性下腰痛阿片类药物处方的组织病理学和社会心理驱动因素。
Sci Rep. 2025 Aug 21;15(1):30690. doi: 10.1038/s41598-025-13619-7.
3
Application of causal forest double machine learning (DML) approach to assess tuberculosis preventive therapy's impact on ART adherence.

本文引用的文献

1
Dataset meta-level and statistical features affect machine learning performance.数据集的元级别和统计特征会影响机器学习性能。
Sci Rep. 2024 Jan 19;14(1):1670. doi: 10.1038/s41598-024-51825-x.
2
Sex-related difference in the retinal structure of young adults: a machine learning approach.年轻成年人视网膜结构的性别差异:一种机器学习方法。
Front Med (Lausanne). 2023 Dec 14;10:1275308. doi: 10.3389/fmed.2023.1275308. eCollection 2023.
3
Ensemble Learning for Disease Prediction: A Review.用于疾病预测的集成学习:综述
应用因果森林双机器学习(DML)方法评估结核病预防性治疗对艾滋病抗病毒治疗依从性的影响。
Sci Rep. 2025 Aug 9;15(1):29130. doi: 10.1038/s41598-025-14460-8.
4
Predicting car accident severity in Northwest Ethiopia: a machine learning approach leveraging driver, environmental, and road conditions.预测埃塞俄比亚西北部的车祸严重程度:一种利用驾驶员、环境和道路状况的机器学习方法。
Sci Rep. 2025 Jul 1;15(1):21913. doi: 10.1038/s41598-025-08005-2.
5
A Machine Learning Approach for Predicting the Pure-Component Surface Tension of Atmospherically Relevant Organic Compounds.一种用于预测大气相关有机化合物纯组分表面张力的机器学习方法。
ACS EST Air. 2025 Apr 8;2(5):808-823. doi: 10.1021/acsestair.4c00291. eCollection 2025 May 9.
6
Decoding vital variables in predicting different phases of suicide among young adults with childhood sexual abuse: a machine learning approach.解码童年期性虐待的年轻成年人自杀不同阶段预测中的关键变量:一种机器学习方法。
Transl Psychiatry. 2025 Apr 24;15(1):158. doi: 10.1038/s41398-025-03360-0.
7
Adoption of K-means clustering algorithm in smart city security analysis and mythical experience analysis of urban image.K均值聚类算法在智慧城市安全分析及城市形象的虚拟体验分析中的应用。
PLoS One. 2025 Mar 10;20(3):e0319620. doi: 10.1371/journal.pone.0319620. eCollection 2025.
8
Patterns of childhood trauma co-occurrence and its predictivity for suicidality: A machine learning approach.儿童期创伤共现模式及其对自杀倾向的预测性:一种机器学习方法。
iScience. 2025 Jan 28;28(2):111877. doi: 10.1016/j.isci.2025.111877. eCollection 2025 Feb 21.
9
Automated mold defects classification in paintings: A comparison of machine learning and rule-based techniques.绘画中自动模具缺陷分类:机器学习与基于规则技术的比较。
PLoS One. 2025 Jan 24;20(1):e0316996. doi: 10.1371/journal.pone.0316996. eCollection 2025.
10
Nitrogen monitoring and inversion algorithms of fruit trees based on spectral remote sensing: a deep review.基于光谱遥感的果树氮素监测与反演算法:深度综述
Front Plant Sci. 2024 Nov 22;15:1489151. doi: 10.3389/fpls.2024.1489151. eCollection 2024.
Healthcare (Basel). 2023 Jun 20;11(12):1808. doi: 10.3390/healthcare11121808.
4
Prediction of anticancer peptides based on an ensemble model of deep learning and machine learning using ordinal positional encoding.基于使用有序位置编码的深度学习和机器学习集成模型预测抗癌肽。
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac630.
5
Optimization of CNN through Novel Training Strategy for Visual Classification Problems.通过新颖训练策略优化卷积神经网络以解决视觉分类问题
Entropy (Basel). 2018 Apr 17;20(4):290. doi: 10.3390/e20040290.
6
Comparing different supervised machine learning algorithms for disease prediction.比较不同的监督机器学习算法在疾病预测中的应用。
BMC Med Inform Decis Mak. 2019 Dec 21;19(1):281. doi: 10.1186/s12911-019-1004-8.
7
Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.基于质谱的代谢组学数据的缺失值插补方法。
Sci Rep. 2018 Jan 12;8(1):663. doi: 10.1038/s41598-017-19120-0.
8
Incorporating efficient radial basis function networks and significant amino acid pairs for predicting GTP binding sites in transport proteins.结合高效径向基函数网络和重要氨基酸对来预测转运蛋白中的GTP结合位点。
BMC Bioinformatics. 2016 Dec 22;17(Suppl 19):501. doi: 10.1186/s12859-016-1369-y.
9
Machine learning: Trends, perspectives, and prospects.机器学习:趋势、观点和展望。
Science. 2015 Jul 17;349(6245):255-60. doi: 10.1126/science.aaa8415.
10
Decision tree methods: applications for classification and prediction.决策树方法:分类与预测应用
Shanghai Arch Psychiatry. 2015 Apr 25;27(2):130-5. doi: 10.11919/j.issn.1002-0829.215044.