• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Enriched Random Forest for High Dimensional Genomic Data.高维基因组数据的富集随机森林。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Sep-Oct;19(5):2817-2828. doi: 10.1109/TCBB.2021.3089417. Epub 2022 Oct 10.
2
Enriched random forests.增强随机森林
Bioinformatics. 2008 Sep 15;24(18):2010-4. doi: 10.1093/bioinformatics/btn356. Epub 2008 Jul 22.
3
Improved random forest classification model combined with C5.0 algorithm for vegetation feature analysis in non-agricultural environments.结合C5.0算法的改进随机森林分类模型用于非农业环境中的植被特征分析
Sci Rep. 2024 May 6;14(1):10367. doi: 10.1038/s41598-024-60066-x.
4
Rotation of random forests for genomic and proteomic classification problems.随机森林旋转算法在基因组和蛋白质组分类问题中的应用。
Adv Exp Med Biol. 2011;696:211-21. doi: 10.1007/978-1-4419-7046-6_21.
5
Oblique and rotation double random forest.倾斜和旋转双重随机森林。
Neural Netw. 2022 Sep;153:496-517. doi: 10.1016/j.neunet.2022.06.012. Epub 2022 Jun 18.
6
Classification of large microarray datasets using fast random forest construction.使用快速随机森林构建对大型微阵列数据集进行分类。
J Bioinform Comput Biol. 2011 Apr;9(2):251-67. doi: 10.1142/s021972001100546x.
7
Protein-Protein Interaction Sites Prediction Based on an Under-Sampling Strategy and Random Forest Algorithm.基于欠采样策略和随机森林算法的蛋白质-蛋白质相互作用位点预测
IEEE/ACM Trans Comput Biol Bioinform. 2022 Nov-Dec;19(6):3646-3654. doi: 10.1109/TCBB.2021.3123269. Epub 2022 Dec 8.
8
Ensemble Feature Learning of Genomic Data Using Support Vector Machine.使用支持向量机的基因组数据集成特征学习
PLoS One. 2016 Jun 15;11(6):e0157330. doi: 10.1371/journal.pone.0157330. eCollection 2016.
9
G-Forest: An ensemble method for cost-sensitive feature selection in gene expression microarrays.G-Forest:一种用于基因表达微阵列中成本敏感特征选择的集成方法。
Artif Intell Med. 2020 Aug;108:101941. doi: 10.1016/j.artmed.2020.101941. Epub 2020 Aug 14.
10
A novel approach to build accurate and diverse decision tree forest.一种构建准确且多样的决策树森林的新方法。
Evol Intell. 2022;15(1):439-453. doi: 10.1007/s12065-020-00519-0. Epub 2021 Jan 3.

引用本文的文献

1
ML modeling of ultimate and relative bond strength for corroded rebars based on concrete and steel properties.基于混凝土和钢材性能的锈蚀钢筋极限粘结强度和相对粘结强度的机器学习建模。
Sci Rep. 2025 Jul 23;15(1):26830. doi: 10.1038/s41598-025-09532-8.
2
The Use of Machine Learning for Analyzing Real-World Data in Disease Prediction and Management: Systematic Review.机器学习在疾病预测与管理中分析真实世界数据的应用:系统评价
JMIR Med Inform. 2025 Jun 19;13:e68898. doi: 10.2196/68898.
3
Morphological traits and machine learning for genetic lineage prediction of two reef-building corals.用于两种造礁珊瑚遗传谱系预测的形态特征与机器学习
PLoS One. 2025 Jun 18;20(6):e0326095. doi: 10.1371/journal.pone.0326095. eCollection 2025.
4
Developing an Explainable Artificial Intelligence System for the Mobile-Based Diagnosis of Febrile Diseases Using Random Forest, LIME, and GPT.利用随机森林、局部可解释模型无关解释(LIME)和生成式预训练变换器(GPT)开发用于基于移动设备的热病诊断的可解释人工智能系统。
Healthc Inform Res. 2025 Apr;31(2):125-135. doi: 10.4258/hir.2025.31.2.125. Epub 2025 Apr 30.
5
A computed tomography-based radiomics prediction model for BRAF mutation status in colorectal cancer.一种基于计算机断层扫描的结直肠癌BRAF突变状态的影像组学预测模型。
Abdom Radiol (NY). 2025 May 15. doi: 10.1007/s00261-025-04983-z.
6
A novel two-stage feature selection method based on random forest and improved genetic algorithm for enhancing classification in machine learning.一种基于随机森林和改进遗传算法的新型两阶段特征选择方法,用于增强机器学习中的分类。
Sci Rep. 2025 May 14;15(1):16828. doi: 10.1038/s41598-025-01761-1.
7
Integrating DRN-RF with computer vision for detection of control room operator's mental fatigue.将深度残差网络(DRN-RF)与计算机视觉相结合用于检测控制室操作员的精神疲劳。
PLoS One. 2025 Apr 9;20(4):e0320780. doi: 10.1371/journal.pone.0320780. eCollection 2025.
8
Identification and validation of SHC1 and FGFR1 as novel immune-related oxidative stress biomarkers of non-obstructive azoospermia.鉴定和验证 SHC1 和 FGFR1 作为非阻塞性无精子症新型免疫相关氧化应激生物标志物。
Front Endocrinol (Lausanne). 2024 Sep 26;15:1356959. doi: 10.3389/fendo.2024.1356959. eCollection 2024.
9
Commentary: Immune cell infiltration and prognostic index in cervical cancer: insights from metabolism-related differential genes.评论:宫颈癌中的免疫细胞浸润与预后指数:来自代谢相关差异基因的见解
Front Immunol. 2024 Sep 19;15:1446741. doi: 10.3389/fimmu.2024.1446741. eCollection 2024.
10
Enhancing the Interpretability of Malaria and Typhoid Diagnosis with Explainable AI and Large Language Models.利用可解释人工智能和大语言模型提高疟疾和伤寒诊断的可解释性
Trop Med Infect Dis. 2024 Sep 16;9(9):216. doi: 10.3390/tropicalmed9090216.

本文引用的文献

1
FUNNEL-GSEA: FUNctioNal ELastic-net regression in time-course gene set enrichment analysis.FUNNEL-GSEA:时间序列基因集富集分析中的功能弹性网络回归。
Bioinformatics. 2017 Jul 1;33(13):1944-1952. doi: 10.1093/bioinformatics/btx104.
2
Ensemble Feature Learning of Genomic Data Using Support Vector Machine.使用支持向量机的基因组数据集成特征学习
PLoS One. 2016 Jun 15;11(6):e0157330. doi: 10.1371/journal.pone.0157330. eCollection 2016.
3
McTwo: a two-step feature selection algorithm based on maximal information coefficient.McTwo:一种基于最大信息系数的两步特征选择算法。
BMC Bioinformatics. 2016 Mar 23;17:142. doi: 10.1186/s12859-016-0990-0.
4
Mem-mEN: Predicting Multi-Functional Types of Membrane Proteins by Interpretable Elastic Nets.Mem-mEN:通过可解释弹性网络预测膜蛋白的多功能类型
IEEE/ACM Trans Comput Biol Bioinform. 2016 Jul-Aug;13(4):706-18. doi: 10.1109/TCBB.2015.2474407. Epub 2015 Aug 28.
5
mLASSO-Hum: A LASSO-based interpretable human-protein subcellular localization predictor.mLASSO-Hum:一种基于套索算法的可解释的人类蛋白质亚细胞定位预测器。
J Theor Biol. 2015 Oct 7;382:223-34. doi: 10.1016/j.jtbi.2015.06.042. Epub 2015 Jul 9.
6
Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.使用基于质量的两阶段随机森林进行全基因组关联数据分类和单核苷酸多态性选择。
BMC Genomics. 2015;16 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-16-S2-S5. Epub 2015 Jan 21.
7
Random forests for genomic data analysis.随机森林在基因组数据分析中的应用。
Genomics. 2012 Jun;99(6):323-9. doi: 10.1016/j.ygeno.2012.04.003. Epub 2012 Apr 21.
8
The Bayesian lasso for genome-wide association studies.贝叶斯套索在全基因组关联研究中的应用。
Bioinformatics. 2011 Feb 15;27(4):516-23. doi: 10.1093/bioinformatics/btq688. Epub 2010 Dec 14.
9
Enriched random forests.增强随机森林
Bioinformatics. 2008 Sep 15;24(18):2010-4. doi: 10.1093/bioinformatics/btn356. Epub 2008 Jul 22.
10
Classification of genomic data: some aspects of feature selection.基因组数据分类:特征选择的若干方面
Talanta. 2008 Jul 30;76(3):564-74. doi: 10.1016/j.talanta.2008.03.045. Epub 2008 Apr 4.

高维基因组数据的富集随机森林。

Enriched Random Forest for High Dimensional Genomic Data.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2022 Sep-Oct;19(5):2817-2828. doi: 10.1109/TCBB.2021.3089417. Epub 2022 Oct 10.

DOI:10.1109/TCBB.2021.3089417
PMID:34129502
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9923687/
Abstract

Ensemble methods such as random forest works well on high-dimensional datasets. However, when the number of features is extremely large compared to the number of samples and the percentage of truly informative feature is very small, performance of traditional random forest decline significantly. To this end, we develop a novel approach that enhance the performance of traditional random forest by reducing the contribution of trees whose nodes are populated with less informative features. The proposed method selects eligible subsets at each node by weighted random sampling as opposed to simple random sampling in traditional random forest. We refer to this modified random forest algorithm as "Enriched Random Forest". Using several high-dimensional micro-array datasets, we evaluate the performance of our approach in both regression and classification settings. In addition, we also demonstrate the effectiveness of balanced leave-one-out cross-validation to reduce computational load and decrease sample size while computing feature weights. Overall, the results indicate that enriched random forest improves the prediction accuracy of traditional random forest, especially when relevant features are very few.

摘要

集成方法,如随机森林,在高维数据集上表现良好。然而,当特征数量与样本数量相比非常大,并且真正有信息量的特征的百分比非常小时,传统随机森林的性能会显著下降。为此,我们开发了一种新方法,通过减少节点中填充信息量较少特征的树的贡献来增强传统随机森林的性能。所提出的方法通过加权随机抽样而不是传统随机森林中的简单随机抽样在每个节点选择合格的子集。我们将这种修改后的随机森林算法称为“富集随机森林”。使用几个高维微阵列数据集,我们在回归和分类设置中评估了我们方法的性能。此外,我们还演示了平衡留一交叉验证的有效性,以减少计算特征权重时的计算负载和样本量。总的来说,结果表明,富集随机森林提高了传统随机森林的预测准确性,尤其是在相关特征非常少的情况下。