• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于电子健康记录的关联研究的统计推断:处理选择偏倚和结局错误分类。

Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification.

机构信息

Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA.

出版信息

Biometrics. 2022 Mar;78(1):214-226. doi: 10.1111/biom.13400. Epub 2020 Dec 3.

DOI:10.1111/biom.13400
PMID:33179768
Abstract

Health research using electronic health records (EHR) has gained popularity, but misclassification of EHR-derived disease status and lack of representativeness of the study sample can result in substantial bias in effect estimates and can impact power and type I error. In this paper, we develop new strategies for handling disease status misclassification and selection bias in EHR-based association studies. We first focus on each type of bias separately. For misclassification, we propose three novel likelihood-based bias correction strategies. A distinguishing feature of the EHR setting is that misclassification may be related to patient-varying factors, and the proposed methods leverage data in the EHR to estimate misclassification rates without gold standard labels. For addressing selection bias, we describe how calibration and inverse probability weighting methods from the survey sampling literature can be extended and applied to the EHR setting. Addressing misclassification and selection biases simultaneously is a more challenging problem than dealing with each on its own, and we propose several new strategies. For all methods proposed, we derive valid standard error estimators and provide software for implementation. We provide a new suite of statistical estimation and inference strategies for addressing misclassification and selection bias simultaneously that is tailored to problems arising in EHR data analysis. We apply these methods to data from The Michigan Genomics Initiative, a longitudinal EHR-linked biorepository.

摘要

利用电子健康记录 (EHR) 进行健康研究已经越来越受欢迎,但 EHR 衍生疾病状态的分类错误和研究样本的代表性不足可能会导致效应估计产生大量偏差,并影响效力和 I 型错误率。在本文中,我们为处理基于 EHR 的关联研究中的疾病状态分类错误和选择偏差开发了新策略。我们首先分别关注每种偏差。对于分类错误,我们提出了三种新的基于似然的偏差校正策略。EHR 设置的一个显著特点是,分类错误可能与患者个体差异有关,所提出的方法利用 EHR 中的数据在没有金标准标签的情况下估计分类错误率。为了解决选择偏差,我们描述了如何扩展调查抽样文献中的校准和逆概率加权方法,并将其应用于 EHR 设置。同时解决分类错误和选择偏差比单独处理每一个问题更具挑战性,我们提出了几种新策略。对于提出的所有方法,我们推导出有效的标准误差估计量,并提供了用于实现的软件。我们为解决 EHR 数据分析中出现的问题,提供了一套新的同时处理分类错误和选择偏差的统计估计和推断策略。我们将这些方法应用于密歇根基因组倡议(The Michigan Genomics Initiative)的数据,这是一个纵向的 EHR 链接生物库。

相似文献

1
Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification.基于电子健康记录的关联研究的统计推断:处理选择偏倚和结局错误分类。
Biometrics. 2022 Mar;78(1):214-226. doi: 10.1111/biom.13400. Epub 2020 Dec 3.
2
Case studies in bias reduction and inference for electronic health record data with selection bias and phenotype misclassification.具有选择偏倚和表型错分的电子健康记录数据的偏差减少和推断的案例研究。
Stat Med. 2022 Dec 10;41(28):5501-5516. doi: 10.1002/sim.9579. Epub 2022 Sep 21.
3
An analytic framework for exploring sampling and observation process biases in genome and phenome-wide association studies using electronic health records.一种用于利用电子健康记录探索全基因组和全表型组关联研究中的抽样和观察过程偏差的分析框架。
Stat Med. 2020 Jun 30;39(14):1965-1979. doi: 10.1002/sim.8524. Epub 2020 Mar 20.
4
Bias reduction and inference for electronic health record data under selection and phenotype misclassification: three case studies.选择和表型错误分类情况下电子健康记录数据的偏差减少与推断:三个案例研究
medRxiv. 2020 Dec 23:2020.12.21.20248644. doi: 10.1101/2020.12.21.20248644.
5
To weight or not to weight? The effect of selection bias in 3 large electronic health record-linked biobanks and recommendations for practice.是否要进行体重测量?3 个大型电子健康记录相关生物库中的选择偏倚效应及其实践建议。
J Am Med Inform Assoc. 2024 Jun 20;31(7):1479-1492. doi: 10.1093/jamia/ocae098.
6
Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city.大数据与健康调查联合应用的流行率估计:以纽约市电子健康记录为例的示范研究。
BMC Med Res Methodol. 2020 Apr 6;20(1):77. doi: 10.1186/s12874-020-00956-6.
7
A framework for understanding selection bias in real-world healthcare data.一个用于理解真实世界医疗数据中选择偏倚的框架。
J R Stat Soc Ser A Stat Soc. 2024 May 2;187(3):606-635. doi: 10.1093/jrsssa/qnae039. eCollection 2024 Aug.
8
Inflation of type I error rates due to differential misclassification in EHR-derived outcomes: Empirical illustration using breast cancer recurrence.由于电子病历衍生结局的差异误分类导致 I 类错误率膨胀:基于乳腺癌复发的实证说明。
Pharmacoepidemiol Drug Saf. 2019 Feb;28(2):264-268. doi: 10.1002/pds.4680. Epub 2018 Oct 30.
9
Addressing Information Biases Within Electronic Health Record Data to Improve the Examination of Epidemiologic Associations With Diabetes Prevalence Among Young Adults: Cross-Sectional Study.解决电子健康记录数据中的信息偏差,以改善对年轻人糖尿病患病率的流行病学关联的检查:横断面研究。
JMIR Med Inform. 2024 Oct 1;12:e58085. doi: 10.2196/58085.
10
Improved generalized raking estimators to address dependent covariate and failure-time outcome error.改进的广义耙式估计器,以解决相关协变量和失效时间结果误差。
Biom J. 2021 Jun;63(5):1006-1027. doi: 10.1002/bimj.202000187. Epub 2021 Mar 11.

引用本文的文献

1
Sensitivity Analysis for Binary Outcome Misclassification in Randomization Tests via Integer Programming.通过整数规划对随机化检验中二元结果误分类的敏感性分析
J Comput Graph Stat. 2025 Apr 17. doi: 10.1080/10618600.2025.2461222.
2
Identification and Validation of Novel Combinatorial Genetic Risk Factors for Endometriosis across Multiple UK and US Patient Cohorts.英国和美国多个患者队列中子宫内膜异位症新型组合遗传风险因素的识别与验证
medRxiv. 2025 Aug 15:2025.08.13.25333595. doi: 10.1101/2025.08.13.25333595.
3
Quantitative bias analysis for mismeasured variables in health research: a review of software tools.
健康研究中测量错误变量的定量偏差分析:软件工具综述
BMC Med Res Methodol. 2025 Aug 1;25(1):187. doi: 10.1186/s12874-025-02635-w.
4
Epidemiology of Aspergillosis Diagnoses in the U.S. using a National EHR Database, 2013-2023.2013 - 2023年美国使用国家电子健康记录数据库进行曲霉病诊断的流行病学情况
medRxiv. 2025 Jun 23:2025.06.19.25329882. doi: 10.1101/2025.06.19.25329882.
5
Reproducibility of genetic risk factors identified for long COVID using combinatorial analysis across US and UK patient cohorts with diverse ancestries.通过对美国和英国不同血统患者队列进行组合分析确定的长期新冠病毒遗传风险因素的可重复性。
J Transl Med. 2025 May 8;23(1):516. doi: 10.1186/s12967-025-06535-x.
6
Detection Bias in EHR-Based Research on Clinical Exposures and Dementia.基于电子健康记录的临床暴露与痴呆症研究中的检测偏倚
JAMA Netw Open. 2025 Apr 1;8(4):e256637. doi: 10.1001/jamanetworkopen.2025.6637.
7
The association between long-term exposure to PM constituents and ischemic stroke in the New York City metropolitan area.纽约市大都市区长期暴露于细颗粒物成分与缺血性中风之间的关联。
Chemosphere. 2025 Jun;378:144390. doi: 10.1016/j.chemosphere.2025.144390. Epub 2025 Apr 8.
8
Transparency in the secondary use of health data: assessing the status quo of guidance and best practices.健康数据二次使用中的透明度:评估指南和最佳实践的现状
R Soc Open Sci. 2025 Mar 26;12(3):241364. doi: 10.1098/rsos.241364. eCollection 2025 Mar.
9
Exposure to autoimmune disorders is associated with increased Alzheimer's disease risk in a multi-site electronic health record analysis.在一项多中心电子健康记录分析中,暴露于自身免疫性疾病与阿尔茨海默病风险增加相关。
Cell Rep Med. 2025 Mar 18;6(3):101980. doi: 10.1016/j.xcrm.2025.101980. Epub 2025 Feb 24.
10
Secure and federated genome-wide association studies for biobank-scale datasets.针对生物样本库规模数据集的安全且联合的全基因组关联研究。
Nat Genet. 2025 Apr;57(4):809-814. doi: 10.1038/s41588-025-02109-1. Epub 2025 Feb 24.