• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

尽善尽美?一种新的预测性能估计方法。

As good as it gets? A new approach to estimating possible prediction performance.

机构信息

Villanova School of Business, Villanova, PA, United States of America.

Robert H. Smith School of Business, University of Maryland, College Park, MD, United States of America.

出版信息

PLoS One. 2024 Oct 16;19(10):e0296904. doi: 10.1371/journal.pone.0296904. eCollection 2024.

DOI:10.1371/journal.pone.0296904
PMID:39413074
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11482679/
Abstract

How much information does a dataset contain about an outcome of interest? To answer this question, estimates are generated for a given dataset, representing the minimum possible absolute prediction error for an outcome variable that any model could achieve. The estimate is produced using a constrained omniscient model that mandates only that identical observations receive identical predictions, and that observations which are very similar to each other receive predictions that are alike. It is demonstrated that the resulting prediction accuracy bounds function effectively on both simulated data and real-world datasets. This method generates bounds on predictive performance typically within 10% of the performance of the true model, and performs well across a range of simulated and real datasets. Three applications of the methodology are discussed: measuring data quality, model evaluation, and quantifying the amount of irreducible error in a prediction problem.

摘要

一个数据集包含了多少关于感兴趣的结果的信息?为了回答这个问题,针对给定的数据集生成了估计值,这些估计值代表任何模型都可以实现的结果变量的最小绝对预测误差。该估计值是使用一种受约束的全知模型生成的,该模型仅要求相同的观测值得到相同的预测,并且非常相似的观测值得到相似的预测。结果表明,所得到的预测精度边界函数在模拟数据和真实数据集上都能有效地工作。该方法通常可以在真实模型性能的 10% 范围内生成预测性能的边界,并且在一系列模拟和真实数据集上表现良好。讨论了该方法的三个应用:测量数据质量、模型评估和量化预测问题中的不可约误差量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe5/11482679/9692a12f838a/pone.0296904.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe5/11482679/7abd4205e1ad/pone.0296904.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe5/11482679/97244911d5a6/pone.0296904.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe5/11482679/08abc39344f2/pone.0296904.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe5/11482679/64e2190302aa/pone.0296904.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe5/11482679/56db9709b234/pone.0296904.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe5/11482679/6812e106dfe7/pone.0296904.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe5/11482679/9692a12f838a/pone.0296904.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe5/11482679/7abd4205e1ad/pone.0296904.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe5/11482679/97244911d5a6/pone.0296904.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe5/11482679/08abc39344f2/pone.0296904.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe5/11482679/64e2190302aa/pone.0296904.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe5/11482679/56db9709b234/pone.0296904.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe5/11482679/6812e106dfe7/pone.0296904.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe5/11482679/9692a12f838a/pone.0296904.g007.jpg

相似文献

1
As good as it gets? A new approach to estimating possible prediction performance.尽善尽美?一种新的预测性能估计方法。
PLoS One. 2024 Oct 16;19(10):e0296904. doi: 10.1371/journal.pone.0296904. eCollection 2024.
2
The C1C2: a framework for simultaneous model selection and assessment.C1C2:一种用于同时进行模型选择和评估的框架。
BMC Bioinformatics. 2008 Sep 2;9:360. doi: 10.1186/1471-2105-9-360.
3
Error bounds for data-driven models of dynamical systems.动力系统数据驱动模型的误差界
Comput Biol Med. 2007 May;37(5):670-9. doi: 10.1016/j.compbiomed.2006.06.005. Epub 2006 Aug 8.
4
Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping.通过自举法平衡偏差和方差,对基于预测的最后一个聚类进行最优调整。
PLoS One. 2019 Nov 4;14(11):e0223529. doi: 10.1371/journal.pone.0223529. eCollection 2019.
5
Estimation of prediction error for survival models.生存模型预测误差估计。
Stat Med. 2010 Jan 30;29(2):262-74. doi: 10.1002/sim.3758.
6
Assessment and statistical modeling of the relationship between remotely sensed aerosol optical depth and PM2.5 in the eastern United States.美国东部地区遥感气溶胶光学厚度与PM2.5之间关系的评估及统计建模
Res Rep Health Eff Inst. 2012 May(167):5-83; discussion 85-91.
7
Improvement of Time Forecasting Models Using Machine Learning for Future Pandemic Applications Based on COVID-19 Data 2020-2022.基于2020 - 2022年新冠疫情数据,利用机器学习改进时间预测模型以用于未来大流行应用
Diagnostics (Basel). 2023 Mar 15;13(6):1121. doi: 10.3390/diagnostics13061121.
8
Prediction models for clustered data with informative priors for the random effects: a simulation study.具有信息先验的随机效应聚集数据的预测模型:一项模拟研究。
BMC Med Res Methodol. 2018 Aug 6;18(1):83. doi: 10.1186/s12874-018-0543-5.
9
Bias in error estimation when using cross-validation for model selection.在使用交叉验证进行模型选择时误差估计中的偏差。
BMC Bioinformatics. 2006 Feb 23;7:91. doi: 10.1186/1471-2105-7-91.
10
Correcting the optimal resampling-based error rate by estimating the error rate of wrapper algorithms.通过估计包装算法的错误率来校正基于最优重采样的错误率。
Biometrics. 2013 Sep;69(3):693-702. doi: 10.1111/biom.12041. Epub 2013 Jul 11.

本文引用的文献

1
Addressing Internet of Things security by enhanced sine cosine metaheuristics tuned hybrid machine learning model and results interpretation based on SHAP approach.通过增强正弦余弦元启发式算法调整的混合机器学习模型解决物联网安全问题,并基于SHAP方法进行结果解释。
PeerJ Comput Sci. 2023 Jun 30;9:e1405. doi: 10.7717/peerj-cs.1405. eCollection 2023.
2
Proposing a hybrid metaheuristic optimization algorithm and machine learning model for energy use forecast in non-residential buildings.提出一种混合元启发式优化算法和机器学习模型,用于预测非住宅建筑的能源使用情况。
Sci Rep. 2022 Jan 20;12(1):1065. doi: 10.1038/s41598-022-04923-7.
3
Tuning hyperparameters of machine learning algorithms and deep neural networks using metaheuristics: A bioinformatics study on biomedical and biological cases.
使用元启发式算法调整机器学习算法和深度神经网络的超参数:生物信息学在生物医学和生物学案例中的研究。
Comput Biol Chem. 2022 Apr;97:107619. doi: 10.1016/j.compbiolchem.2021.107619. Epub 2021 Dec 24.
4
Points of Significance: Machine learning: a primer.要点:机器学习:入门。
Nat Methods. 2017 Nov 30;14(12):1119-1120. doi: 10.1038/nmeth.4526.
5
Approaching the limit of predictability in human mobility.接近人类流动性可预测性的极限。
Sci Rep. 2013 Oct 11;3:2923. doi: 10.1038/srep02923.
6
An overview of health forecasting.健康预测概述。
Environ Health Prev Med. 2013 Jan;18(1):1-9. doi: 10.1007/s12199-012-0294-6. Epub 2012 Jul 28.
7
Limits of predictability in human mobility.人类流动性的可预测性极限。
Science. 2010 Feb 19;327(5968):1018-21. doi: 10.1126/science.1177170.
8
General conditions for predictivity in learning theory.学习理论中预测性的一般条件。
Nature. 2004 Mar 25;428(6981):419-22. doi: 10.1038/nature02341.