Suppr超能文献

利用易错算法衍生的表型:增强电子健康记录数据中风险因素的关联研究。

Leveraging error-prone algorithm-derived phenotypes: Enhancing association studies for risk factors in EHR data.

机构信息

Center for Health AI and Synthesis of Evidence (CHASE), Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; The Graduate Group in Applied Mathematics and Computational Science, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA.

Center for Health AI and Synthesis of Evidence (CHASE), Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

出版信息

J Biomed Inform. 2024 Sep;157:104690. doi: 10.1016/j.jbi.2024.104690. Epub 2024 Jul 14.

Abstract

OBJECTIVES

It has become increasingly common for multiple computable phenotypes from electronic health records (EHR) to be developed for a given phenotype. However, EHR-based association studies often focus on a single phenotype. In this paper, we develop a method aiming to simultaneously make use of multiple EHR-derived phenotypes for reduction of bias due to phenotyping error and improved efficiency of phenotype/exposure associations.

MATERIALS AND METHODS

The proposed method combines multiple algorithm-derived phenotypes with a small set of validated outcomes to reduce bias and improve estimation accuracy and efficiency. The performance of our method was evaluated through simulation studies and real-world application to an analysis of colon cancer recurrence using EHR data from Kaiser Permanente Washington.

RESULTS

In settings where there was no single surrogate performing uniformly better than all others in terms of both sensitivity and specificity, our method achieved substantial bias reduction compared to using a single algorithm-derived phenotype. Our method also led to higher estimation efficiency by up to 30% compared to an estimator that used only one algorithm-derived phenotype.

DISCUSSION

Simulation studies and application to real-world data demonstrated the effectiveness of our method in integrating multiple phenotypes, thereby enhancing bias reduction, statistical accuracy and efficiency.

CONCLUSIONS

Our method combines information across multiple surrogates using a statistically efficient seemingly unrelated regression framework. Our method provides a robust alternative to single-surrogate-based bias correction, especially in contexts lacking information on which surrogate is superior.

摘要

目的

从电子健康记录(EHR)中为给定的表型开发多个可计算表型已变得越来越普遍。然而,基于 EHR 的关联研究通常集中在单个表型上。在本文中,我们开发了一种方法,旨在同时利用多个基于 EHR 的表型来减少表型错误引起的偏差,并提高表型/暴露关联的效率。

材料和方法

该方法将多个算法衍生表型与一小部分验证的结果相结合,以减少偏差并提高估计的准确性和效率。通过模拟研究和使用 Kaiser Permanente Washington 的 EHR 数据对结肠癌复发的分析进行的实际应用,评估了我们方法的性能。

结果

在没有单个替代物在敏感性和特异性方面都普遍优于所有其他替代物的情况下,与使用单个算法衍生表型相比,我们的方法实现了显著的偏差减少。与仅使用一种算法衍生表型的估计器相比,我们的方法还提高了高达 30%的估计效率。

讨论

模拟研究和对真实数据的应用表明,我们的方法在整合多个表型方面是有效的,从而增强了偏差减少、统计准确性和效率。

结论

我们的方法使用统计上有效的看似不相关回归框架结合了多个替代物的信息。我们的方法为基于单个替代物的偏差校正提供了一种稳健的替代方法,特别是在缺乏有关哪个替代物更优的信息的情况下。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验