• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于电子健康记录结果成本效益验证的最优代理辅助抽样

Optimal Surrogate-Assisted Sampling for Cost-Efficient Validation of Electronic Health Record Outcomes.

作者信息

Marks-Anglin Arielle, Chen Jianmin, Luo Chongliang, Hubbard Rebecca, Chen Yong

机构信息

Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA.

Division of Public Health Sciences, Washington University School of Medicine, St Louis, MO, USA.

出版信息

Stat Med. 2025 May;44(10-12):e70095. doi: 10.1002/sim.70095.

DOI:10.1002/sim.70095
PMID:40404279
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12097881/
Abstract

Electronic Health Record (EHR) databases are an increasingly valuable resource for observational studies. However, misclassification of EHR-derived outcomes due to imperfect phenotyping leads to bias, inflated type I error, and reduced power in risk-factor association studies. On the other hand, manual chart review to validate outcomes is both cost-prohibitive and time-consuming, and a randomly selected validation sample may not yield sufficient cases to support precise model estimation when the disease is rare. Sampling procedures have been developed for maximizing computational and statistical efficiency in settings where the true disease status is known. However, less work has been done in measurement constrained settings, particularly when an informative surrogate outcome is available. Motivated by this gap, we propose an Optimal Subsampling strategy with Surrogate-Assisted Two-step procedure (OSSAT) to guide cost-effective chart review in measurement constrained settings. The sampling weight in OSSAT leverages information contained in the potentially misclassified phenotype and covariates to prioritize observations most informative for the model of interest. We compare our proposed weight with existing approaches through simulations under various covariate distributions, differential misclassification rates and degrees of surrogate accuracy. We then apply our proposed weighting schemes to a study of risk factors for second breast cancer events using a real EHR data set.

摘要

电子健康记录(EHR)数据库对于观察性研究而言是一种越来越有价值的资源。然而,由于表型不完美导致源自EHR的结果出现错误分类,会在危险因素关联研究中导致偏差、第一类错误膨胀以及检验效能降低。另一方面,通过人工查阅病历以验证结果既成本高昂又耗时,而且当疾病罕见时,随机选择的验证样本可能无法产生足够的病例来支持精确的模型估计。在已知真实疾病状态的情况下,已经开发出抽样程序以实现计算和统计效率的最大化。然而,在测量受限的情况下开展的工作较少,特别是当有一个信息丰富的替代结局可用时。受这一差距的启发,我们提出一种具有替代辅助两步程序的最优子抽样策略(OSSAT),以指导在测量受限情况下进行具有成本效益的病历查阅。OSSAT中的抽样权重利用潜在错误分类的表型和协变量中包含的信息,对对于感兴趣的模型最具信息性的观察进行优先排序。我们通过在各种协变量分布、不同错误分类率和替代准确性程度下的模拟,将我们提出的权重与现有方法进行比较。然后,我们将我们提出的加权方案应用于一项使用真实EHR数据集的二次乳腺癌事件危险因素研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06e7/12097881/4256d36c4186/SIM-44-0-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06e7/12097881/4f659e2259fa/SIM-44-0-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06e7/12097881/f46d1c335d78/SIM-44-0-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06e7/12097881/4256d36c4186/SIM-44-0-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06e7/12097881/4f659e2259fa/SIM-44-0-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06e7/12097881/f46d1c335d78/SIM-44-0-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06e7/12097881/4256d36c4186/SIM-44-0-g001.jpg

相似文献

1
Optimal Surrogate-Assisted Sampling for Cost-Efficient Validation of Electronic Health Record Outcomes.用于电子健康记录结果成本效益验证的最优代理辅助抽样
Stat Med. 2025 May;44(10-12):e70095. doi: 10.1002/sim.70095.
2
Leveraging undecided cases in chart-reviewed phenotypes to enhance EHR-based association studies.利用图表审查表型中的不确定病例来加强基于电子健康记录的关联研究。
J Biomed Inform. 2025 Jun;166:104839. doi: 10.1016/j.jbi.2025.104839. Epub 2025 Apr 30.
3
[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].[容量与健康结果:来自系统评价和意大利医院数据评估的证据]
Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100.
4
Automated devices for identifying peripheral arterial disease in people with leg ulceration: an evidence synthesis and cost-effectiveness analysis.用于识别下肢溃疡患者外周动脉疾病的自动化设备:证据综合和成本效益分析。
Health Technol Assess. 2024 Aug;28(37):1-158. doi: 10.3310/TWCG3912.
5
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
6
Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.染色体臂 1p 和 19q 缺失的检测在胶质瘤患者中的诊断准确性和成本效益。
Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.
7
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
8
Evaluating the Bias, type I error and statistical power of the prior Knowledge-Guided integrated likelihood estimation (PIE) for bias reduction in EHR based association studies.评估用于减少基于电子健康记录(EHR)的关联研究中偏差的先验知识引导综合似然估计(PIE)的偏差、I型错误和统计功效。
J Biomed Inform. 2025 Mar;163:104787. doi: 10.1016/j.jbi.2025.104787. Epub 2025 Feb 2.
9
The Lived Experience of Autistic Adults in Employment: A Systematic Search and Synthesis.成年自闭症患者的就业生活经历:系统检索与综述
Autism Adulthood. 2024 Dec 2;6(4):495-509. doi: 10.1089/aut.2022.0114. eCollection 2024 Dec.
10
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of topotecan for ovarian cancer.拓扑替康治疗卵巢癌的临床有效性和成本效益的快速系统评价。
Health Technol Assess. 2001;5(28):1-110. doi: 10.3310/hta5280.

本文引用的文献

1
Impact of Diverse Data Sources on Computational Phenotyping.多源数据对计算表型分析的影响。
Front Genet. 2020 Jun 3;11:556. doi: 10.3389/fgene.2020.00556. eCollection 2020.
2
Risk of second primary breast cancer among cancer survivors: Implications for prevention and screening practice.癌症幸存者中第二原发乳腺癌的风险:对预防和筛查实践的影响。
PLoS One. 2020 Jun 4;15(6):e0232800. doi: 10.1371/journal.pone.0232800. eCollection 2020.
3
Robust and efficient semi-supervised estimation of average treatment effects with application to electronic health records data.
具有应用于电子健康记录数据的稳健且高效的平均处理效应的半监督估计。
Biometrics. 2021 Jun;77(2):413-423. doi: 10.1111/biom.13298. Epub 2020 May 25.
4
Reducing Bias Due to Outcome Misclassification for Epidemiologic Studies Using EHR-derived Probabilistic Phenotypes.利用电子病历衍生的概率性表型降低因结局错分导致的流行病学研究偏倚。
Epidemiology. 2020 Jul;31(4):542-550. doi: 10.1097/EDE.0000000000001193.
5
An augmented estimation procedure for EHR-based association studies accounting for differential misclassification.基于电子健康记录的关联研究的增强估计程序,考虑到差异误诊。
J Am Med Inform Assoc. 2020 Feb 1;27(2):244-253. doi: 10.1093/jamia/ocz180.
6
Discovery of Noncancer Drug Effects on Survival in Electronic Health Records of Patients With Cancer: A New Paradigm for Drug Repurposing.在癌症患者电子健康记录中发现非癌症药物对生存的影响:药物重新利用的新范例
JCO Clin Cancer Inform. 2019 May;3:1-9. doi: 10.1200/CCI.19.00001.
7
Inflation of type I error rates due to differential misclassification in EHR-derived outcomes: Empirical illustration using breast cancer recurrence.由于电子病历衍生结局的差异误分类导致 I 类错误率膨胀:基于乳腺癌复发的实证说明。
Pharmacoepidemiol Drug Saf. 2019 Feb;28(2):264-268. doi: 10.1002/pds.4680. Epub 2018 Oct 30.
8
A Bayesian latent class approach for EHR-based phenotyping.基于电子健康记录的表型分析的贝叶斯潜在类别方法。
Stat Med. 2019 Jan 15;38(1):74-87. doi: 10.1002/sim.7953. Epub 2018 Sep 3.
9
Optimal Subsampling for Large Sample Logistic Regression.大样本逻辑回归的最优子采样
J Am Stat Assoc. 2018;113(522):829-844. doi: 10.1080/01621459.2017.1292914. Epub 2018 Jun 6.
10
Methods for enhancing the reproducibility of biomedical research findings using electronic health records.利用电子健康记录提高生物医学研究结果可重复性的方法。
BioData Min. 2017 Sep 11;10:31. doi: 10.1186/s13040-017-0151-7. eCollection 2017.