• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于类别不平衡数据的随机森林分位数分类器。

A Random Forests Quantile Classifier for Class Imbalanced Data.

作者信息

O'Brien Robert, Ishwaran Hemant

机构信息

Division of Biostatistics, University of Miami, Miami, FL 33136, USA.

出版信息

Pattern Recognit. 2019 Jun;90:232-249. doi: 10.1016/j.patcog.2019.01.036. Epub 2019 Jan 29.

DOI:10.1016/j.patcog.2019.01.036
PMID:30765897
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6370055/
Abstract

Extending previous work on quantile classifiers (-classifiers) we propose the *-classifier for the class imbalance problem. The classifier assigns a sample to the minority class if the minority class conditional probability exceeds 0 * 1, where * equals the unconditional probability of observing a minority class sample. The motivation for *-classification stems from a density-based approach and leads to the useful property that the *-classifier maximizes the sum of the true positive and true negative rates. Moreover, because the procedure can be equivalently expressed as a cost-weighted Bayes classifier, it also minimizes weighted risk. Because of this dual optimization, the *-classifier can achieve near zero risk in imbalance problems, while simultaneously optimizing true positive and true negative rates. We use random forests to apply *-classification. This new method which we call RFQ is shown to outperform or is competitive with existing techniques with respect to -mean performance and variable selection. Extensions to the multiclass imbalanced setting are also considered.

摘要

在先前关于分位数分类器(-分类器)工作的基础上进行扩展,我们针对类别不平衡问题提出了 -分类器。如果少数类条件概率超过0 * 1(其中 * 等于观察到少数类样本的无条件概率),则该分类器将一个样本分配到少数类。-分类的动机源于基于密度的方法,并导致了一个有用的特性,即 -分类器使真阳性率和真阴性率之和最大化。此外,由于该过程可以等效地表示为成本加权贝叶斯分类器,它还使加权风险最小化。由于这种双重优化,-分类器在不平衡问题中可以实现接近零的风险,同时优化真阳性率和真阴性率。我们使用随机森林来应用 *-分类。我们称之为RFQ的这种新方法在 -均值性能和变量选择方面表现优于现有技术或与之具有竞争力。还考虑了对多类不平衡设置的扩展。

相似文献

1
A Random Forests Quantile Classifier for Class Imbalanced Data.用于类别不平衡数据的随机森林分位数分类器。
Pattern Recognit. 2019 Jun;90:232-249. doi: 10.1016/j.patcog.2019.01.036. Epub 2019 Jan 29.
2
Class-imbalanced classifiers for high-dimensional data.高维数据的不平衡分类器。
Brief Bioinform. 2013 Jan;14(1):13-26. doi: 10.1093/bib/bbs006. Epub 2012 Mar 9.
3
Bayes Imbalance Impact Index: A Measure of Class Imbalanced Data Set for Classification Problem.贝叶斯不平衡影响指数:一种用于分类问题的类别不平衡数据集的度量。
IEEE Trans Neural Netw Learn Syst. 2020 Sep;31(9):3525-3539. doi: 10.1109/TNNLS.2019.2944962. Epub 2019 Nov 1.
4
Class prediction for high-dimensional class-imbalanced data.高维类别不平衡数据的类别预测。
BMC Bioinformatics. 2010 Oct 20;11:523. doi: 10.1186/1471-2105-11-523.
5
CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests.基于随机森林的用于特征选择和参数优化的CURE-SMOTE算法及混合算法。
BMC Bioinformatics. 2017 Mar 14;18(1):169. doi: 10.1186/s12859-017-1578-z.
6
Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric.使用马修斯相关系数度量的不平衡数据最优分类器。
PLoS One. 2017 Jun 2;12(6):e0177678. doi: 10.1371/journal.pone.0177678. eCollection 2017.
7
Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data.从不平衡数据中进行深度特征表示的成本敏感学习。
IEEE Trans Neural Netw Learn Syst. 2018 Aug;29(8):3573-3587. doi: 10.1109/TNNLS.2017.2732482. Epub 2017 Aug 17.
8
A novel multi-class imbalanced EEG signals classification based on the adaptive synthetic sampling (ADASYN) approach.一种基于自适应合成采样(ADASYN)方法的新型多类不平衡脑电信号分类。
PeerJ Comput Sci. 2021 May 14;7:e523. doi: 10.7717/peerj-cs.523. eCollection 2021.
9
Fault diagnosis method of rolling bearing based on multiple classifier ensemble of the weighted and balanced distribution adaptation under limited sample imbalance.基于有限样本不平衡下加权平衡分布自适应的多分类器集成的滚动轴承故障诊断方法
ISA Trans. 2021 Aug;114:434-443. doi: 10.1016/j.isatra.2020.12.034. Epub 2020 Dec 17.
10
Conversion of adverse data corpus to shrewd output using sampling metrics.使用抽样指标将不良数据语料库转换为精准输出。
Vis Comput Ind Biomed Art. 2020 Aug 11;3(1):19. doi: 10.1186/s42492-020-00055-9.

引用本文的文献

1
Harnessing the power of virtual (digital) twins: Graphical causal tools for understanding patient and hospital differences.利用虚拟(数字)孪生的力量:用于理解患者和医院差异的图形因果工具。
Comput Struct Biotechnol J. 2025 Aug 27;28:312-320. doi: 10.1016/j.csbj.2025.08.017. eCollection 2025.
2
Baseline predictors of responders to auricular point acupressure in chronic low back pain.慢性下腰痛患者中对耳穴按压有反应者的基线预测因素
Clin Tradit Med Pharmacol. 2025 Jun;6(2). doi: 10.1016/j.ctmp.2025.200215. Epub 2025 Apr 14.
3
Adjuvant Therapy after Esophagectomy for Esophageal Cancer: Who Needs It?: Multi-institution Worldwide Observational Study.

本文引用的文献

1
Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival.随机森林回归、分类和生存中变量重要性的标准误差和置信区间。
Stat Med. 2019 Feb 20;38(4):558-582. doi: 10.1002/sim.7803. Epub 2018 Jun 4.
2
Using random forests to diagnose aviation turbulence.使用随机森林诊断航空湍流。
Mach Learn. 2014;95(1):51-70. doi: 10.1007/s10994-013-5346-7. Epub 2013 Apr 23.
3
Novel data-mining approach identifies biomarkers for diagnosis of Kawasaki disease.新型数据挖掘方法识别出用于诊断川崎病的生物标志物。
食管癌切除术后的辅助治疗:谁需要它?全球多机构观察性研究
Ann Surg Open. 2024 Oct 15;5(4):e497. doi: 10.1097/AS9.0000000000000497. eCollection 2024 Dec.
4
Exploring predictors of substance use disorder treatment engagement with machine learning: The impact of social determinants of health in the therapeutic landscape.运用机器学习探索物质使用障碍治疗参与的预测因子:治疗领域中健康的社会决定因素的影响。
J Subst Use Addict Treat. 2024 Sep;164:209435. doi: 10.1016/j.josat.2024.209435. Epub 2024 Jun 8.
5
Pre-test Prediction of Non-ischemic Cardiomyopathies using Time-Series EHR Data.使用时间序列电子健康记录数据对非缺血性心肌病进行预测试预测
AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:239-248. eCollection 2024.
6
Machine Learning of Plasma Proteomics Classifies Diagnosis of Interstitial Lung Disease.机器学习对血浆蛋白质组学进行分类,可用于诊断间质性肺病。
Am J Respir Crit Care Med. 2024 Aug 15;210(4):444-454. doi: 10.1164/rccm.202309-1692OC.
7
Improving the Prediction of 1-Year Right Ventricular Failure After Left Ventricular Assist Device Implantation.提高左心室辅助装置植入后 1 年右心衰竭的预测能力。
ASAIO J. 2024 Jun 1;70(6):495-501. doi: 10.1097/MAT.0000000000002152. Epub 2024 Feb 12.
8
Prediction of non emergent acute care utilization and cost among patients receiving Medicaid.预测接受医疗补助的患者中不需要紧急护理的利用情况和成本。
Sci Rep. 2024 Jan 23;14(1):824. doi: 10.1038/s41598-023-51114-z.
9
Identification of protein biomarkers associated with congenital diaphragmatic hernia in human amniotic fluid.鉴定与人类羊水先天性膈疝相关的蛋白质生物标志物。
Sci Rep. 2023 Sep 19;13(1):15483. doi: 10.1038/s41598-023-42576-2.
10
Human footprint is associated with shifts in the assemblages of major vector-borne diseases.人类活动足迹与主要媒介传播疾病的组合变化有关。
Nat Sustain. 2023 Jun;6(6):652-661. doi: 10.1038/s41893-023-01080-1. Epub 2023 Mar 13.
Pediatr Res. 2015 Nov;78(5):547-53. doi: 10.1038/pr.2015.137. Epub 2015 Aug 3.
4
Near-Bayesian Support Vector Machines for imbalanced data classification with equal or unequal misclassification costs.带有同等或不等误分类代价的不平衡数据分类的近贝叶斯支持向量机。
Neural Netw. 2015 Oct;70:39-52. doi: 10.1016/j.neunet.2015.06.005. Epub 2015 Jul 8.
5
LOCAL CASE-CONTROL SAMPLING: EFFICIENT SUBSAMPLING IN IMBALANCED DATA SETS.局部病例对照抽样:不平衡数据集中的高效子抽样
Ann Stat. 2014 Oct 1;42(5):1693-1724. doi: 10.1214/14-AOS1220.
6
Random survival forests for competing risks.用于竞争风险的随机生存森林
Biostatistics. 2014 Oct;15(4):757-73. doi: 10.1093/biostatistics/kxu010. Epub 2014 Apr 11.
7
Early identification of potentially salvageable tissue with MRI-based predictive algorithms after experimental ischemic stroke.实验性缺血性卒中后基于 MRI 的预测算法对潜在可挽救组织的早期识别。
J Cereb Blood Flow Metab. 2013 Jul;33(7):1075-82. doi: 10.1038/jcbfm.2013.51. Epub 2013 Apr 10.
8
Using multivariate machine learning methods and structural MRI to classify childhood onset schizophrenia and healthy controls.使用多变量机器学习方法和结构磁共振成像对儿童期起病的精神分裂症和健康对照进行分类。
Front Psychiatry. 2012 Jun 1;3:53. doi: 10.3389/fpsyt.2012.00053. eCollection 2012.
9
Predicting disease risks from highly imbalanced data using random forest.基于随机森林算法从高度不平衡数据中预测疾病风险。
BMC Med Inform Decis Mak. 2011 Jul 29;11:51. doi: 10.1186/1472-6947-11-51.
10
Multiplexed immunoassay panel identifies novel CSF biomarkers for Alzheimer's disease diagnosis and prognosis.多重免疫分析试剂盒鉴定出阿尔茨海默病诊断和预后的新型 CSF 生物标志物。
PLoS One. 2011 Apr 19;6(4):e18850. doi: 10.1371/journal.pone.0018850.