• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估随机森林的自再现性以发现最佳的短生物标志物特征。

Assessing Random Forest self-reproducibility for optimal short biomarker signature discovery.

作者信息

Debit Ahmed, Poulet Christophe, Josse Claire, Jerusalem Guy, Azencott Chloe-Agathe, Bours Vincent, Van Steen Kristel

机构信息

Laboratory of Human Genetics, GIGA Institute, University of Liege (ULiege), Avenue Hippocrate 1/11, 4000 Liege, Belgium.

BIO3, GIGA Institute, University of Liege (ULiege), Avenue Hippocrate 1/11, 4000 Liege, Belgium.

出版信息

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf318.

DOI:10.1093/bib/bbaf318
PMID:40641044
Abstract

Biomarker signature discovery remains the main path to developing clinical diagnostic tools when the biological knowledge on pathology is weak. Shortest signatures are often preferred to reduce the cost of the diagnostic. The ability to find the best and shortest signature relies on the robustness of the models that can be built on such a set of molecules. The classification algorithm that will be used is often selected based on the average Area Under the Curve (AUC) performance of its models. However, it is not guaranteed that an algorithm with a large AUC distribution will keep a stable performance when facing data. Here, we propose two AUC-derived hyper-stability scores, the Hyper-stability Resampling Sensitive (HRS) and the Hyper-stability Signature Sensitive (HSS), as complementary metrics to the average AUC that should bring confidence in the choice for the best classification algorithm. To emphasize the importance of these scores, we compared 15 different Random Forest implementations. Our findings show that the Random Forest implementation should be chosen according to the data at hand and the classification question being evaluated. No Random Forest implementation can be used universally for any classification and on any dataset. Each of them should be tested for their average AUC performance and AUC-derived stability, prior to analysis.

摘要

当病理学方面的生物学知识薄弱时,生物标志物特征发现仍然是开发临床诊断工具的主要途径。为降低诊断成本,通常更倾向于最短的特征。找到最佳且最短特征的能力依赖于基于这样一组分子构建的模型的稳健性。所使用的分类算法通常是根据其模型的平均曲线下面积(AUC)性能来选择的。然而,不能保证具有较大AUC分布的算法在面对数据时能保持稳定性能。在此,我们提出两个源自AUC的超稳定性分数,即超稳定性重采样敏感性(HRS)和超稳定性特征敏感性(HSS),作为平均AUC的补充指标,它们应能为选择最佳分类算法带来信心。为强调这些分数的重要性,我们比较了15种不同的随机森林实现。我们的研究结果表明,应根据手头的数据和所评估的分类问题来选择随机森林实现。没有一种随机森林实现可以普遍适用于任何分类和任何数据集。在进行分析之前,应对它们各自的平均AUC性能和源自AUC的稳定性进行测试。

相似文献

1
Assessing Random Forest self-reproducibility for optimal short biomarker signature discovery.评估随机森林的自再现性以发现最佳的短生物标志物特征。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf318.
2
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
3
Automated devices for identifying peripheral arterial disease in people with leg ulceration: an evidence synthesis and cost-effectiveness analysis.用于识别下肢溃疡患者外周动脉疾病的自动化设备:证据综合和成本效益分析。
Health Technol Assess. 2024 Aug;28(37):1-158. doi: 10.3310/TWCG3912.
4
Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗:一项系统综述
Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.
5
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
6
Sexual Harassment and Prevention Training性骚扰与预防培训
7
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
8
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
9
Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.评估慢性阻塞性肺疾病干预措施的比较效果:面向临床医生的网状Meta分析教程
Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x.
10
Clinical symptoms, signs and tests for identification of impending and current water-loss dehydration in older people.老年人即将发生和当前失水脱水的识别的临床症状、体征及检查
Cochrane Database Syst Rev. 2015 Apr 30;2015(4):CD009647. doi: 10.1002/14651858.CD009647.pub2.

本文引用的文献

1
Discovery of sparse, reliable omic biomarkers with Stabl.利用 Stabl 发现稀疏、可靠的组学生物标志物
Nat Biotechnol. 2024 Oct;42(10):1581-1593. doi: 10.1038/s41587-023-02033-x. Epub 2024 Jan 2.
2
High throughput proteomics identifies a high-accuracy 11 plasma protein biomarker signature for ovarian cancer.高通量蛋白质组学鉴定出卵巢癌高精度的 11 种血浆蛋白生物标志物特征。
Commun Biol. 2019 Jun 20;2:221. doi: 10.1038/s42003-019-0464-9. eCollection 2019.
3
Predictive and on-treatment monitoring biomarkers in advanced melanoma: Moving toward personalized medicine.
晚期黑色素瘤的预测和治疗监测生物标志物:迈向个体化医学。
Cancer Treat Rev. 2018 Dec;71:8-18. doi: 10.1016/j.ctrv.2018.09.005. Epub 2018 Sep 21.
4
On the overestimation of random forest's out-of-bag error.随机森林的袋外误差高估问题。
PLoS One. 2018 Aug 6;13(8):e0201904. doi: 10.1371/journal.pone.0201904. eCollection 2018.
5
Iterative random forests to discover predictive and stable high-order interactions.迭代随机森林发现预测和稳定的高阶交互。
Proc Natl Acad Sci U S A. 2018 Feb 20;115(8):1943-1948. doi: 10.1073/pnas.1711236115. Epub 2018 Jan 19.
6
A multiplex platform for the identification of ovarian cancer biomarkers.一种用于鉴定卵巢癌生物标志物的多重平台。
Clin Proteomics. 2017 Oct 10;14:34. doi: 10.1186/s12014-017-9169-6. eCollection 2017.
7
Making Meaningful Clinical Use of Biomarkers.实现生物标志物的有意义临床应用。
Biomark Insights. 2017 Jun 19;12:1177271917715236. doi: 10.1177/1177271917715236. eCollection 2017.
8
Clinical use of biomarkers in breast cancer: Updated guidelines from the European Group on Tumor Markers (EGTM).生物标志物在乳腺癌中的临床应用:欧洲肿瘤标志物小组(EGTM)的更新指南。
Eur J Cancer. 2017 Apr;75:284-298. doi: 10.1016/j.ejca.2017.01.017. Epub 2017 Feb 28.
9
Prediction of chemo-response in serous ovarian cancer.浆液性卵巢癌化疗反应的预测
Mol Cancer. 2016 Oct 19;15(1):66. doi: 10.1186/s12943-016-0548-9.
10
Circulating microRNA-based screening tool for breast cancer.用于乳腺癌的循环微小RNA筛查工具
Oncotarget. 2016 Feb 2;7(5):5416-28. doi: 10.18632/oncotarget.6786.