• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Avoiding Biased Clinical Machine Learning Model Performance Estimates in the Presence of Label Selection.在存在标签选择的情况下避免有偏倚的临床机器学习模型性能估计。
AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:81-90. eCollection 2023.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction.评估数据漂移对临床脓毒症预测中使用的机器学习模型性能的影响。
Int J Med Inform. 2023 May;173:104930. doi: 10.1016/j.ijmedinf.2022.104930. Epub 2022 Nov 19.
4
DEPLOYR: a technical framework for deploying custom real-time machine learning models into the electronic medical record.DEPLOYR:一个将定制的实时机器学习模型部署到电子病历中的技术框架。
J Am Med Inform Assoc. 2023 Aug 18;30(9):1532-1542. doi: 10.1093/jamia/ocad114.
5
[Standard technical specifications for methacholine chloride (Methacholine) bronchial challenge test (2023)].[氯化乙酰甲胆碱支气管激发试验标准技术规范(2023年)]
Zhonghua Jie He He Hu Xi Za Zhi. 2024 Feb 12;47(2):101-119. doi: 10.3760/cma.j.cn112147-20231019-00247.
6
Machine Learning for Causal Inference: On the Use of Cross-fit Estimators.机器学习在因果推断中的应用:基于交叉拟合估计量的研究。
Epidemiology. 2021 May 1;32(3):393-401. doi: 10.1097/EDE.0000000000001332.
7
Data-Adaptive Estimation for Double-Robust Methods in Population-Based Cancer Epidemiology: Risk Differences for Lung Cancer Mortality by Emergency Presentation.基于人群的癌症流行病学中双稳健方法的数据自适应估计:急诊就诊的肺癌死亡率的风险差异。
Am J Epidemiol. 2018 Apr 1;187(4):871-878. doi: 10.1093/aje/kwx317.
8
Machine Learning Can be Used to Predict Function but Not Pain After Surgery for Thumb Carpometacarpal Osteoarthritis.机器学习可用于预测拇指腕掌关节炎手术后的功能而非疼痛。
Clin Orthop Relat Res. 2022 Jul 1;480(7):1271-1284. doi: 10.1097/CORR.0000000000002105. Epub 2022 Jan 18.
9
Evolution and impact of bias in human and machine learning algorithm interaction.人类与机器学习算法交互中的偏差演变与影响。
PLoS One. 2020 Aug 13;15(8):e0235502. doi: 10.1371/journal.pone.0235502. eCollection 2020.
10
Development and validation of an ensemble machine-learning model for predicting early mortality among patients with bone metastases of hepatocellular carcinoma.用于预测肝细胞癌骨转移患者早期死亡率的集成机器学习模型的开发与验证
Front Oncol. 2023 Feb 20;13:1144039. doi: 10.3389/fonc.2023.1144039. eCollection 2023.

引用本文的文献

1
Monitoring strategies for continuous evaluation of deployed clinical prediction models.用于持续评估已部署临床预测模型的监测策略。
J Biomed Inform. 2025 Aug;168:104854. doi: 10.1016/j.jbi.2025.104854. Epub 2025 Jun 5.
2
Feedback Loop Failure Modes in Medical Diagnosis: How Biases Can Emerge and Be Reinforced.医学诊断中的反馈回路失效模式:偏差是如何产生和强化的。
Med Decis Making. 2024 Jul;44(5):481-496. doi: 10.1177/0272989X241248612. Epub 2024 May 13.
3
DEPLOYR: a technical framework for deploying custom real-time machine learning models into the electronic medical record.DEPLOYR:一个将定制的实时机器学习模型部署到电子病历中的技术框架。
J Am Med Inform Assoc. 2023 Aug 18;30(9):1532-1542. doi: 10.1093/jamia/ocad114.

本文引用的文献

1
On the estimation of average treatment effects with right-censored time to event outcome and competing risks.在存在右删失的时间事件结局和竞争风险的情况下,对平均处理效应的估计。
Biom J. 2020 May;62(3):751-763. doi: 10.1002/bimj.201800298. Epub 2020 Feb 11.
2
Prevalence and Predictability of Low-Yield Inpatient Laboratory Diagnostic Tests.低产量住院实验室诊断检测的流行率和可预测性。
JAMA Netw Open. 2019 Sep 4;2(9):e1910967. doi: 10.1001/jamanetworkopen.2019.10967.
3
Prognostic models will be victims of their own success, unless….预后模型将成为自身成功的受害者,除非……
J Am Med Inform Assoc. 2019 Dec 1;26(12):1645-1650. doi: 10.1093/jamia/ocz145.
4
Improving palliative care with deep learning.利用深度学习改善姑息治疗。
BMC Med Inform Decis Mak. 2018 Dec 12;18(Suppl 4):122. doi: 10.1186/s12911-018-0677-8.
5
Assessment of machine-learning techniques on large pathology data sets to address assay redundancy in routine liver function test profiles.基于大型病理学数据集评估机器学习技术,以解决常规肝功能测试项目中的检测冗余问题。
Diagnosis (Berl). 2015 Feb 1;2(1):41-51. doi: 10.1515/dx-2014-0063.
6
Evaluating disease prediction models using a cohort whose covariate distribution differs from that of the target population.使用协变量分布与目标人群不同的队列评估疾病预测模型。
Stat Methods Med Res. 2019 Jan;28(1):309-320. doi: 10.1177/0962280217723945. Epub 2017 Aug 16.
7
Using Machine Learning to Predict Laboratory Test Results.使用机器学习预测实验室检测结果。
Am J Clin Pathol. 2016 Jun;145(6):778-88. doi: 10.1093/ajcp/aqw064. Epub 2016 Jun 21.
8
OrderRex: clinical order decision support and outcome predictions by data-mining electronic medical records.OrderRex:通过数据挖掘电子病历实现临床医嘱决策支持与结果预测。
J Am Med Inform Assoc. 2016 Mar;23(2):339-48. doi: 10.1093/jamia/ocv091. Epub 2015 Jul 21.
9
Adjustment for selection bias in observational studies with application to the analysis of autopsy data.观察性研究中选择偏倚的调整及其在尸检数据分析中的应用。
Neuroepidemiology. 2009;32(3):229-39. doi: 10.1159/000197389. Epub 2009 Jan 29.
10
A structural approach to selection bias.一种针对选择偏倚的结构化方法。
Epidemiology. 2004 Sep;15(5):615-25. doi: 10.1097/01.ede.0000135174.63482.43.

在存在标签选择的情况下避免有偏倚的临床机器学习模型性能估计。

Avoiding Biased Clinical Machine Learning Model Performance Estimates in the Presence of Label Selection.

作者信息

Corbin Conor K, Baiocchi Michael, Chen Jonathan H

机构信息

Department of Biomedical Data Science, Stanford, California, USA.

Center for Biomedical Informatics Research, Stanford, California, USA.

出版信息

AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:81-90. eCollection 2023.

PMID:37350883
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10283136/
Abstract

When evaluating the performance of clinical machine learning models, one must consider the deployment population. When the population of patients with observed labels is only a subset of the deployment population (label selection), standard model performance estimates on the observed population may be misleading. In this study we describe three classes of label selection and simulate five causally distinct scenarios to assess how particular selection mechanisms bias a suite of commonly reported binary machine learning model performance metrics. Simulations reveal that when selection is affected by observed features, naive estimates of model discrimination may be misleading. When selection is affected by labels, naive estimates of calibration fail to reflect reality. We borrow traditional weighting estimators from causal inference literature and find that when selection probabilities are properly specified, they recover full population estimates. We then tackle the real-world task of monitoring the performance of deployed machine learning models whose interactions with clinicians feed-back and affect the selection mechanism of the labels. We train three machine learning models to flag low-yield laboratory diagnostics, and simulate their intended consequence of reducing wasteful laboratory utilization. We find that naive estimates of AUROC on the observed population undershoot actual performance by up to 20%. Such a disparity could be large enough to lead to the wrongful termination of a successful clinical decision support tool. We propose an altered deployment procedure, one that combines injected randomization with traditional weighted estimates, and find it recovers true model performance.

摘要

在评估临床机器学习模型的性能时,必须考虑部署人群。当具有观察标签的患者人群只是部署人群的一个子集(标签选择)时,对观察人群的标准模型性能估计可能会产生误导。在本研究中,我们描述了三类标签选择,并模拟了五种因果关系不同的场景,以评估特定的选择机制如何使一系列常见的二元机器学习模型性能指标产生偏差。模拟结果表明,当选择受观察特征影响时,模型判别力的朴素估计可能会产生误导。当选择受标签影响时,校准的朴素估计无法反映实际情况。我们借鉴因果推断文献中的传统加权估计量,发现当正确指定选择概率时,它们能恢复总体估计。然后,我们处理监测已部署机器学习模型性能的实际任务,这些模型与临床医生的交互会反馈并影响标签的选择机制。我们训练了三个机器学习模型来标记低收益的实验室诊断,并模拟它们减少不必要实验室使用的预期效果。我们发现,对观察人群的AUROC朴素估计比实际性能低达20%。这种差异可能大到足以导致成功的临床决策支持工具被错误终止。我们提出一种改变后的部署程序,即将注入随机化与传统加权估计相结合的程序,并发现它能恢复真实的模型性能。