• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

无监督随机森林

Unsupervised random forests.

作者信息

Mantero Alejandro, Ishwaran Hemant

机构信息

Division of Biostatistics, University of Miami, Miami, Florida, USA.

出版信息

Stat Anal Data Min. 2021 Apr;14(2):144-167. doi: 10.1002/sam.11498. Epub 2021 Feb 5.

DOI:10.1002/sam.11498
PMID:33833846
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8025042/
Abstract

sidClustering is a new random forests unsupervised machine learning algorithm. The first step in sidClustering involves what is called sidification of the features: staggering the features to have mutually exclusive ranges (called the staggered interaction data [SID] main features) and then forming all pairwise interactions (called the SID interaction features). Then a multivariate random forest (able to handle both continuous and categorical variables) is used to predict the SID main features. We establish uniqueness of sidification and show how multivariate impurity splitting is able to identify clusters. The proposed sidClustering method is adept at finding clusters arising from categorical and continuous variables and retains all the important advantages of random forests. The method is illustrated using simulated and real data as well as two in depth case studies, one from a large multi-institutional study of esophageal cancer, and the other involving hospital charges for cardiovascular patients.

摘要

SID聚类是一种新的随机森林无监督机器学习算法。SID聚类的第一步涉及特征的所谓“sid化”:将特征交错排列以具有相互排斥的范围(称为交错交互数据[SID]主特征),然后形成所有成对交互(称为SID交互特征)。然后使用多元随机森林(能够处理连续和分类变量)来预测SID主特征。我们确立了sid化的唯一性,并展示了多元杂质分裂如何能够识别聚类。所提出的SID聚类方法擅长于发现由分类和连续变量产生的聚类,并保留了随机森林的所有重要优点。使用模拟数据和真实数据以及两个深入的案例研究对该方法进行了说明,一个来自对食管癌的大型多机构研究,另一个涉及心血管疾病患者的住院费用。

相似文献

1
Unsupervised random forests.无监督随机森林
Stat Anal Data Min. 2021 Apr;14(2):144-167. doi: 10.1002/sam.11498. Epub 2021 Feb 5.
2
Unsupervised Gene Network Inference with Decision Trees and Random Forests.使用决策树和随机森林进行无监督基因网络推断
Methods Mol Biol. 2019;1883:195-215. doi: 10.1007/978-1-4939-8882-2_8.
3
Random Forest Missing Data Algorithms.随机森林缺失数据算法
Stat Anal Data Min. 2017 Dec;10(6):363-377. doi: 10.1002/sam.11348. Epub 2017 Jun 13.
4
Block Forests: random forests for blocks of clinical and omics covariate data.块森林:用于临床和组学协变量数据块的随机森林。
BMC Bioinformatics. 2019 Jun 27;20(1):358. doi: 10.1186/s12859-019-2942-y.
5
Unsupervised random forest for affinity estimation.用于亲和力估计的无监督随机森林。
Comput Vis Media (Beijing). 2022;8(2):257-272. doi: 10.1007/s41095-021-0241-9. Epub 2021 Dec 6.
6
Calibrating random forests for probability estimation.校准随机森林以进行概率估计。
Stat Med. 2016 Sep 30;35(22):3949-60. doi: 10.1002/sim.6959. Epub 2016 Apr 13.
7
Unbiased feature selection in learning random forests for high-dimensional data.高维数据随机森林学习中的无偏特征选择
ScientificWorldJournal. 2015;2015:471371. doi: 10.1155/2015/471371. Epub 2015 Mar 24.
8
Recursive Random Forests Enable Better Predictive Performance and Model Interpretation than Variable Selection by LASSO.与套索变量选择相比,递归随机森林具有更好的预测性能和模型解释能力。
J Chem Inf Model. 2015 Apr 27;55(4):736-46. doi: 10.1021/ci500715e. Epub 2015 Mar 16.
9
Interpretability and Class Imbalance in Prediction Models for Pain Volatility in Manage My Pain App Users: Analysis Using Feature Selection and Majority Voting Methods.“管理我的疼痛”应用程序用户疼痛波动预测模型中的可解释性与类别不平衡:使用特征选择和多数投票方法的分析
JMIR Med Inform. 2019 Nov 20;7(4):e15601. doi: 10.2196/15601.
10
Unsupervised detection and removal of muscle artifacts from scalp EEG recordings using canonical correlation analysis, wavelets and random forests.使用典型相关分析、小波和随机森林从头皮脑电图记录中无监督地检测和去除肌肉伪迹。
Clin Neurophysiol. 2017 Sep;128(9):1755-1769. doi: 10.1016/j.clinph.2017.06.247. Epub 2017 Jul 8.

引用本文的文献

1
Patterns in Mental Health Symptoms, Substance Use, and Viral Suppression in People with HIV: A Clustering Analysis.HIV感染者的心理健康症状、物质使用及病毒抑制模式:一项聚类分析
AIDS Behav. 2025 Jun 10. doi: 10.1007/s10461-025-04797-6.
2
Parent discrimination clusters and pediatric health in a national survey: The modifying effect of parenting.全国性调查中的父母歧视集群与儿童健康:养育方式的调节作用
SSM Popul Health. 2025 Jan 25;29:101757. doi: 10.1016/j.ssmph.2025.101757. eCollection 2025 Mar.
3
Establishment and Validation of the Diagnostic Value of Oligodendrocyte-related Genes in Alzheimer's Disease.

本文引用的文献

1
Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival.随机森林回归、分类和生存中变量重要性的标准误差和置信区间。
Stat Med. 2019 Feb 20;38(4):558-582. doi: 10.1002/sim.7803. Epub 2018 Jun 4.
2
Random Forest Missing Data Algorithms.随机森林缺失数据算法
Stat Anal Data Min. 2017 Dec;10(6):363-377. doi: 10.1002/sam.11348. Epub 2017 Jun 13.
3
Worldwide Esophageal Cancer Collaboration: clinical staging data.全球食管癌协作组:临床分期数据。
少突胶质细胞相关基因在阿尔茨海默病诊断价值中的建立与验证
CNS Neurol Disord Drug Targets. 2025 Jan 16. doi: 10.2174/0118715273339310241205055554.
4
Artificial Intelligence-Assisted Perfusion Density as Biomarker for Screening Diabetic Nephropathy.人工智能辅助灌注密度作为糖尿病肾病筛查的生物标志物。
Transl Vis Sci Technol. 2024 Oct 1;13(10):19. doi: 10.1167/tvst.13.10.19.
5
A treeless absolutely random forest with closed-form estimators of expected proximities.一种具有期望邻近度闭式估计器的无树完全随机森林。
Stat Anal Data Min. 2024 Apr;17(2). doi: 10.1002/sam.11678. Epub 2024 Apr 9.
6
Assessing the predictive capability of machine learning models in determining clinical outcomes for patients with cervical spondylotic myelopathy treated with laminectomy and posterior spinal fusion.评估机器学习模型在预测接受椎板切除术和后路脊柱融合术治疗的脊髓型颈椎病患者临床结局方面的能力。
Patient Saf Surg. 2024 Jun 6;18(1):21. doi: 10.1186/s13037-024-00403-1.
7
Machine Learning Prediction of Quantum Yields and Wavelengths of Aggregation-Induced Emission Molecules.聚集诱导发光分子量子产率和波长的机器学习预测
Materials (Basel). 2024 Apr 4;17(7):1664. doi: 10.3390/ma17071664.
8
Using unsupervised machine learning to classify behavioral risk markers of bacterial vaginosis.使用无监督机器学习对细菌性阴道病的行为风险标志物进行分类。
Arch Gynecol Obstet. 2024 Mar;309(3):1053-1063. doi: 10.1007/s00404-023-07360-7. Epub 2024 Feb 3.
9
Hybrid model of CT-fractional flow reserve, pericoronary fat attenuation index and radiomics for predicting the progression of WMH: a dual-center pilot study.CT-血流储备分数、冠状动脉周围脂肪衰减指数和放射组学预测脑白质高信号进展的混合模型:一项双中心前瞻性研究。
Front Cardiovasc Med. 2023 Dec 19;10:1282768. doi: 10.3389/fcvm.2023.1282768. eCollection 2023.
10
Active DHEA uptake in the prostate gland correlates with aggressive prostate cancer.在前列腺组织中,DHEA 的摄取与侵袭性前列腺癌呈正相关。
J Clin Invest. 2023 Dec 15;133(24):e171199. doi: 10.1172/JCI171199.
Dis Esophagus. 2016 Oct;29(7):707-714. doi: 10.1111/dote.12493.