• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SAT:一种基于替代辅助的两波病例增强抽样方法,应用于基于电子健康记录的关联研究。

SAT: a Surrogate-Assisted Two-wave case boosting sampling method, with application to EHR-based association studies.

机构信息

Department of Biostatistics, Epidemiology and Informatics, The University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, USA.

Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA.

出版信息

J Am Med Inform Assoc. 2022 Apr 13;29(5):918-927. doi: 10.1093/jamia/ocab267.

DOI:10.1093/jamia/ocab267
PMID:34962283
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9714591/
Abstract

OBJECTIVES

Electronic health records (EHRs) enable investigation of the association between phenotypes and risk factors. However, studies solely relying on potentially error-prone EHR-derived phenotypes (ie, surrogates) are subject to bias. Analyses of low prevalence phenotypes may also suffer from poor efficiency. Existing methods typically focus on one of these issues but seldom address both. This study aims to simultaneously address both issues by developing new sampling methods to select an optimal subsample to collect gold standard phenotypes for improving the accuracy of association estimation.

MATERIALS AND METHODS

We develop a surrogate-assisted two-wave (SAT) sampling method, where a surrogate-guided sampling (SGS) procedure and a modified optimal subsampling procedure motivated from A-optimality criterion (OSMAC) are employed sequentially, to select a subsample for outcome validation through manual chart review subject to budget constraints. A model is then fitted based on the subsample with the true phenotypes. Simulation studies and an application to an EHR dataset of breast cancer survivors are conducted to demonstrate the effectiveness of SAT.

RESULTS

We found that the subsample selected with the proposed method contains informative observations that effectively reduce the mean squared error of the resultant estimator of the association.

CONCLUSIONS

The proposed approach can handle the problem brought by the rarity of cases and misclassification of the surrogate in phenotype-absent EHR-based association studies. With a well-behaved surrogate, SAT successfully boosts the case prevalence in the subsample and improves the efficiency of estimation.

摘要

目的

电子健康记录(EHR)可用于研究表型与风险因素之间的关联。然而,仅依赖于可能容易出错的 EHR 衍生表型(即替代指标)的研究存在偏倚。低患病率表型的分析也可能效率低下。现有的方法通常侧重于解决其中一个问题,但很少同时解决两个问题。本研究旨在通过开发新的抽样方法来同时解决这两个问题,该方法选择最优子样本以收集金标准表型,从而提高关联估计的准确性。

材料和方法

我们开发了一种替代辅助两波(SAT)抽样方法,该方法采用了替代引导抽样(SGS)程序和基于 A 最优性准则(OSMAC)的修改后的最优子抽样程序,在预算约束下,通过手动图表审查为结局验证选择子样本。然后基于真实表型的子样本拟合模型。通过模拟研究和对乳腺癌幸存者 EHR 数据集的应用,证明了 SAT 的有效性。

结果

我们发现,所提出的方法选择的子样本包含信息丰富的观测值,可有效降低关联的结果估计量的均方误差。

结论

该方法可以处理基于 EHR 的关联研究中罕见病例和替代指标分类错误带来的问题。在替代指标表现良好的情况下,SAT 可以成功提高子样本中的病例流行率并提高估计效率。

相似文献

1
SAT: a Surrogate-Assisted Two-wave case boosting sampling method, with application to EHR-based association studies.SAT:一种基于替代辅助的两波病例增强抽样方法,应用于基于电子健康记录的关联研究。
J Am Med Inform Assoc. 2022 Apr 13;29(5):918-927. doi: 10.1093/jamia/ocab267.
2
An augmented estimation procedure for EHR-based association studies accounting for differential misclassification.基于电子健康记录的关联研究的增强估计程序,考虑到差异误诊。
J Am Med Inform Assoc. 2020 Feb 1;27(2):244-253. doi: 10.1093/jamia/ocz180.
3
A cost-effective chart review sampling design to account for phenotyping error in electronic health records (EHR) data.一种具有成本效益的图表审查抽样设计,用于解决电子健康记录 (EHR) 数据中的表型错误。
J Am Med Inform Assoc. 2021 Dec 28;29(1):52-61. doi: 10.1093/jamia/ocab222.
4
Leveraging error-prone algorithm-derived phenotypes: Enhancing association studies for risk factors in EHR data.利用易错算法衍生的表型:增强电子健康记录数据中风险因素的关联研究。
J Biomed Inform. 2024 Sep;157:104690. doi: 10.1016/j.jbi.2024.104690. Epub 2024 Jul 14.
5
6
Phenotype validation in electronic health records based genetic association studies.基于电子健康记录的基因关联研究中的表型验证
Genet Epidemiol. 2017 Dec;41(8):790-800. doi: 10.1002/gepi.22080. Epub 2017 Oct 11.
7
Inflation of type I error rates due to differential misclassification in EHR-derived outcomes: Empirical illustration using breast cancer recurrence.由于电子病历衍生结局的差异误分类导致 I 类错误率膨胀:基于乳腺癌复发的实证说明。
Pharmacoepidemiol Drug Saf. 2019 Feb;28(2):264-268. doi: 10.1002/pds.4680. Epub 2018 Oct 30.
8
Reducing Bias Due to Outcome Misclassification for Epidemiologic Studies Using EHR-derived Probabilistic Phenotypes.利用电子病历衍生的概率性表型降低因结局错分导致的流行病学研究偏倚。
Epidemiology. 2020 Jul;31(4):542-550. doi: 10.1097/EDE.0000000000001193.
9
Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification.基于电子健康记录的关联研究的统计推断:处理选择偏倚和结局错误分类。
Biometrics. 2022 Mar;78(1):214-226. doi: 10.1111/biom.13400. Epub 2020 Dec 3.
10
Automated feature selection of predictors in electronic medical records data.电子病历数据中预测指标的自动特征选择
Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.

引用本文的文献

1
Prediction of traumatic hemorrhagic shock using a Multi-scale exogenous variable model (MS-TimeXer-MoE).使用多尺度外生变量模型(MS-TimeXer-MoE)预测创伤性失血性休克。
Eur J Trauma Emerg Surg. 2025 Jun 5;51(1):222. doi: 10.1007/s00068-025-02878-8.
2
A framework for understanding selection bias in real-world healthcare data.一个用于理解真实世界医疗数据中选择偏倚的框架。
J R Stat Soc Ser A Stat Soc. 2024 May 2;187(3):606-635. doi: 10.1093/jrsssa/qnae039. eCollection 2024 Aug.
3
Scalable and interpretable alternative to chart review for phenotype evaluation using standardized structured data from electronic health records.利用电子健康记录中的标准化结构化数据进行表型评估的可扩展且可解释的图表审查替代方法。
J Am Med Inform Assoc. 2023 Dec 22;31(1):119-129. doi: 10.1093/jamia/ocad202.

本文引用的文献

1
A cost-effective chart review sampling design to account for phenotyping error in electronic health records (EHR) data.一种具有成本效益的图表审查抽样设计,用于解决电子健康记录 (EHR) 数据中的表型错误。
J Am Med Inform Assoc. 2021 Dec 28;29(1):52-61. doi: 10.1093/jamia/ocab222.
2
PheMap: a multi-resource knowledge base for high-throughput phenotyping within electronic health records.PheMap:一个用于电子健康记录中高通量表型分析的多资源知识库。
J Am Med Inform Assoc. 2020 Nov 1;27(11):1675-1687. doi: 10.1093/jamia/ocaa104.
3
sureLDA: A multidisease automated phenotyping method for the electronic health record.SureLDA:一种电子健康记录中的多疾病自动化表型方法。
J Am Med Inform Assoc. 2020 Aug 1;27(8):1235-1243. doi: 10.1093/jamia/ocaa079.
4
How many rare diseases are there?有多少种罕见病?
Nat Rev Drug Discov. 2020 Feb;19(2):77-78. doi: 10.1038/d41573-019-00180-y.
5
An augmented estimation procedure for EHR-based association studies accounting for differential misclassification.基于电子健康记录的关联研究的增强估计程序,考虑到差异误诊。
J Am Med Inform Assoc. 2020 Feb 1;27(2):244-253. doi: 10.1093/jamia/ocz180.
6
High-throughput multimodal automated phenotyping (MAP) with application to PheWAS.高通量多模态自动化表型分析 (MAP) 在 pheWAS 中的应用。
J Am Med Inform Assoc. 2019 Nov 1;26(11):1255-1262. doi: 10.1093/jamia/ocz066.
7
A regression framework to uncover pleiotropy in large-scale electronic health record data.一种在大规模电子健康记录数据中揭示多效性的回归框架。
J Am Med Inform Assoc. 2019 Oct 1;26(10):1083-1090. doi: 10.1093/jamia/ocz084.
8
Optimal Subsampling for Large Sample Logistic Regression.大样本逻辑回归的最优子采样
J Am Stat Assoc. 2018;113(522):829-844. doi: 10.1080/01621459.2017.1292914. Epub 2018 Jun 6.
9
Replication of progressive supranuclear palsy genome-wide association study identifies SLCO1A2 and DUSP10 as new susceptibility loci.进行性核上性麻痹全基因组关联研究的复制发现 SLCO1A2 和 DUSP10 为新的易感基因座。
Mol Neurodegener. 2018 Jul 9;13(1):37. doi: 10.1186/s13024-018-0267-3.
10
Enabling phenotypic big data with PheNorm.利用 PheNorm 实现表型大数据。
J Am Med Inform Assoc. 2018 Jan 1;25(1):54-60. doi: 10.1093/jamia/ocx111.