• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

概率机器:使用非参数学习机器进行一致概率估计。

Probability machines: consistent probability estimation using nonparametric learning machines.

作者信息

Malley J D, Kruppa J, Dasgupta A, Malley K G, Ziegler A

机构信息

Center for Computational Bioscience, Center for Information Technology, National Institutes of Health, Bethesda, USA.

出版信息

Methods Inf Med. 2012;51(1):74-81. doi: 10.3414/ME00-01-0052. Epub 2011 Sep 14.

DOI:10.3414/ME00-01-0052
PMID:21915433
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3250568/
Abstract

BACKGROUND

Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem.

OBJECTIVES

The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities.

METHODS

Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians.

RESULTS

Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software.

CONCLUSIONS

Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.

摘要

背景

大多数机器学习方法仅提供二元响应的分类。然而,使用个体患者特征进行风险估计需要概率。最近已经表明,每一种已知对非参数回归问题一致的统计学习机器都是对该估计问题可证明一致的概率机器。

目的

本文的目的是展示如何使用随机森林和最近邻方法来一致地估计个体概率。

方法

详细描述了两种用于估计个体概率的随机森林算法和两种最近邻算法。我们详细讨论了随机森林、最近邻和其他学习机器的一致性。我们进行了一项模拟研究以说明这些方法的有效性。我们通过分析两个关于阑尾炎诊断和皮马印第安人糖尿病诊断的著名数据集来举例说明这些算法。

结果

模拟证明了该方法的有效性。通过实际数据应用,我们展示了这种方法的准确性和实用性。我们提供了来自R包的示例代码,其中已经可以进行概率估计。这意味着所有计算都可以使用现有软件进行。

结论

随机森林算法以及最近邻方法是用于估计二元响应个体概率的有效机器学习方法。在R中有免费可用的实现,可用于实际应用。

相似文献

1
Probability machines: consistent probability estimation using nonparametric learning machines.概率机器:使用非参数学习机器进行一致概率估计。
Methods Inf Med. 2012;51(1):74-81. doi: 10.3414/ME00-01-0052. Epub 2011 Sep 14.
2
Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory.使用机器学习方法对二分类和多分类结果进行概率估计:理论
Biom J. 2014 Jul;56(4):534-63. doi: 10.1002/bimj.201300068. Epub 2014 Jan 29.
3
Probability estimation with machine learning methods for dichotomous and multicategory outcome: applications.使用机器学习方法进行二分类和多分类结果的概率估计:应用
Biom J. 2014 Jul;56(4):564-83. doi: 10.1002/bimj.201300077. Epub 2014 Feb 12.
4
Calibrating random forests for probability estimation.校准随机森林以进行概率估计。
Stat Med. 2016 Sep 30;35(22):3949-60. doi: 10.1002/sim.6959. Epub 2016 Apr 13.
5
Risk estimation using probability machines.使用概率机进行风险估计。
BioData Min. 2014 Mar 1;7(1):2. doi: 10.1186/1756-0381-7-2.
6
Calibrating machine learning approaches for probability estimation: A comprehensive comparison.校准机器学习方法进行概率估计:全面比较。
Stat Med. 2023 Dec 20;42(29):5451-5478. doi: 10.1002/sim.9921. Epub 2023 Oct 17.
7
Robust location and spread measures for nonparametric probability density function estimation.稳健的位置和扩散度量用于非参数概率密度函数估计。
Int J Neural Syst. 2009 Oct;19(5):345-57. doi: 10.1142/S0129065709002075.
8
Risk estimation and risk prediction using machine-learning methods.利用机器学习方法进行风险评估和预测。
Hum Genet. 2012 Oct;131(10):1639-54. doi: 10.1007/s00439-012-1194-y. Epub 2012 Jul 3.
9
Comparison of an Ensemble of Machine Learning Models and the BERT Language Model for Analysis of Text Descriptions of Brain CT Reports to Determine the Presence of Intracranial Hemorrhage.基于机器学习模型集成与 BERT 语言模型的脑 CT 报告文本描述分析用于判断颅内出血的比较研究
Sovrem Tekhnologii Med. 2024;16(1):27-34. doi: 10.17691/stm2024.16.1.03. Epub 2024 Feb 28.
10
Probability-enhanced sufficient dimension reduction for binary classification.用于二元分类的概率增强型充分降维
Biometrics. 2014 Sep;70(3):546-55. doi: 10.1111/biom.12174. Epub 2014 Apr 29.

引用本文的文献

1
Comparison of imputation methods for univariate categorical longitudinal data.单变量分类纵向数据插补方法的比较
Qual Quant. 2025;59(2):1767-1791. doi: 10.1007/s11135-024-02028-z. Epub 2024 Dec 26.
2
Evolutionary measures show that recurrence of DCIS is distinct from progression to breast cancer.进化分析表明,导管原位癌的复发与进展为乳腺癌不同。
Breast Cancer Res. 2025 Mar 21;27(1):43. doi: 10.1186/s13058-025-01966-2.
3
Learning genotype-phenotype associations from gaps in multi-species sequence alignments.从多物种序列比对的缺口处学习基因型-表型关联。

本文引用的文献

1
Neural Network Classifiers Estimate Bayesian Probabilities.神经网络分类器估计贝叶斯概率。
Neural Comput. 1991 Winter;3(4):461-483. doi: 10.1162/neco.1991.3.4.461.
2
Non-crossing large-margin probability estimation and its application to robust SVM via preconditioning.非交叉大间隔概率估计及其通过预处理在鲁棒支持向量机中的应用。
Stat Methodol. 2011 Jan;8(1):56-67. doi: 10.1016/j.stamet.2009.05.004.
3
Robust Model-Free Multiclass Probability Estimation.强大的无模型多类概率估计
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf022.
4
Nonarteritic Anterior Ischemic Optic Neuropathy in Black Patients.黑人患者的非动脉炎性前部缺血性视神经病变
Am J Ophthalmol. 2025 Feb;270:192-202. doi: 10.1016/j.ajo.2024.09.036. Epub 2024 Oct 15.
5
Disparities in Salmonellosis Incidence for US Counties with Different Social Determinants of Health Profiles Are Also Mediated by Extreme Weather: A Counterfactual Analysis of Laboratory Enteric Disease Surveillance (LEDS) Data From 1997 through 2019.不同健康状况社会决定因素的美国各县沙门氏菌病发病率差异也受极端天气影响:对1997年至2019年实验室肠道疾病监测(LEDS)数据的反事实分析
J Food Prot. 2024 Dec;87(12):100379. doi: 10.1016/j.jfp.2024.100379. Epub 2024 Oct 15.
6
Understanding overfitting in random forest for probability estimation: a visualization and simulation study.理解随机森林在概率估计中的过拟合:可视化与模拟研究。
Diagn Progn Res. 2024 Sep 27;8(1):14. doi: 10.1186/s41512-024-00177-1.
7
Evolutionary Measures Show that Recurrence of DCIS is Distinct from Progression to Breast Cancer.进化分析显示,导管原位癌的复发与进展为乳腺癌不同。
medRxiv. 2024 Aug 16:2024.08.15.24311949. doi: 10.1101/2024.08.15.24311949.
8
Explainable machine learning predicts survival of retroperitoneal liposarcoma: A study based on the SEER database and external validation in China.可解释机器学习预测腹膜后脂肪肉瘤的生存:基于 SEER 数据库的研究和中国的外部验证。
Cancer Med. 2024 Jun;13(11):e7324. doi: 10.1002/cam4.7324.
9
Machine learning-based algorithms applied to drug prescriptions and other healthcare services in the Sicilian claims database to identify acromegaly as a model for the earlier diagnosis of rare diseases.基于机器学习的算法应用于西西里索赔数据库中的药物处方和其他医疗保健服务,以识别肢端肥大症作为早期诊断罕见病的模型。
Sci Rep. 2024 Mar 14;14(1):6186. doi: 10.1038/s41598-024-56240-w.
10
Prediagnostic evaluation of multicancer detection tests: design and analysis considerations.多癌种早期检测试验的预测性评估:设计与分析要点。
J Natl Cancer Inst. 2024 Jun 7;116(6):795-799. doi: 10.1093/jnci/djae050.
J Am Stat Assoc. 2010 Mar 1;105(489):424-436. doi: 10.1198/jasa.2010.tm09107.
4
Developmental validation of the IrisPlex system: determination of blue and brown iris colour for forensic intelligence.虹膜plex 系统的开发验证:用于法医情报的蓝色和棕色虹膜颜色的确定。
Forensic Sci Int Genet. 2011 Nov;5(5):464-71. doi: 10.1016/j.fsigen.2010.09.008. Epub 2010 Oct 14.
5
On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data.随机森林的随机丛林之旅:一种用于高维数据的随机森林的快速实现。
Bioinformatics. 2010 Jul 15;26(14):1752-8. doi: 10.1093/bioinformatics/btq257. Epub 2010 May 26.
6
On Graphically Checking Goodness-of-fit of Binary Logistic Regression Models.关于二元逻辑回归模型拟合优度的图形检验
Methods Inf Med. 2009;48(3):306-10. doi: 10.3414/ME0571. Epub 2009 Mar 31.
7
Patient-centered yes/no prognosis using learning machines.使用学习机器进行以患者为中心的是/否预后评估。
Int J Data Min Bioinform. 2008;2(4):289-341. doi: 10.1504/ijdmb.2008.022149.
8
A fast and efficient segmentation scheme for cell microscopic image.一种用于细胞显微图像的快速高效分割方案。
Cell Mol Biol (Noisy-le-grand). 2007 Apr 27;53(2):51-61.
9
Bayesian network to predict breast cancer risk of mammographic microcalcifications and reduce number of benign biopsy results: initial experience.用于预测乳腺钼靶微钙化乳腺癌风险并减少良性活检结果数量的贝叶斯网络:初步经验
Radiology. 2006 Sep;240(3):666-73. doi: 10.1148/radiol.2403051096.
10
Expert panel assessment of appropriateness of abdominal aortic aneurysm surgery: global judgement versus probability estimation.腹主动脉瘤手术适宜性的专家小组评估:整体判断与概率估计
J Health Serv Res Policy. 1998 Jul;3(3):134-40. doi: 10.1177/135581969800300303.