Suppr超能文献

基于大规模电子健康记录数据衍生变量的验证策略。

A strategy for validation of variables derived from large-scale electronic health record data.

机构信息

VA San Diego Healthcare System, 3500 La Jolla Village Dr, San Diego, CA 92161, USA; University of California San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA.

University of California San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA.

出版信息

J Biomed Inform. 2021 Sep;121:103879. doi: 10.1016/j.jbi.2021.103879. Epub 2021 Jul 27.

Abstract

PURPOSE

Standardized approaches for rigorous validation of phenotyping from large-scale electronic health record (EHR) data have not been widely reported. We proposed a methodologically rigorous and efficient approach to guide such validation, including strategies for sampling cases and controls, determining sample sizes, estimating algorithm performance, and terminating the validation process, hereafter referred to as the San Diego Approach to Variable Validation (SDAVV).

METHODS

We propose sample size formulae which should be used prior to chart review, based on pre-specified critical lower bounds for positive predictive value (PPV) and negative predictive value (NPV). We also propose a stepwise strategy for iterative algorithm development/validation cycles, updating sample sizes for data abstraction until both PPV and NPV achieve target performance.

RESULTS

We applied the SDAVV to a Department of Veterans Affairs study in which we created two phenotyping algorithms, one for distinguishing normal colonoscopy cases from abnormal colonoscopy controls and one for identifying aspirin exposure. Estimated PPV and NPV both reached 0.970 with a 95% confidence lower bound of 0.915, estimated sensitivity was 0.963 and specificity was 0.975 for identifying normal colonoscopy cases. The phenotyping algorithm for identifying aspirin exposure reached a PPV of 0.990 (a 95% lower bound of 0.950), an NPV of 0.980 (a 95% lower bound of 0.930), and sensitivity and specificity were 0.960 and 1.000.

CONCLUSIONS

A structured approach for prospectively developing and validating phenotyping algorithms from large-scale EHR data can be successfully implemented, and should be considered to improve the quality of "big data" research.

摘要

目的

尚未广泛报道用于严格验证来自大规模电子健康记录 (EHR) 数据的表型的标准化方法。我们提出了一种方法学上严格且高效的方法来指导这种验证,包括用于抽样病例和对照、确定样本量、估计算法性能和终止验证过程的策略,以下简称圣地亚哥变量验证方法 (SDAVV)。

方法

我们提出了在进行图表审查之前应使用的样本量公式,这些公式基于阳性预测值 (PPV) 和阴性预测值 (NPV) 的预先指定的临界下限。我们还提出了一种逐步的算法开发/验证循环策略,迭代更新数据抽象的样本量,直到 PPV 和 NPV 均达到目标性能。

结果

我们将 SDAVV 应用于退伍军人事务部的一项研究中,我们在该研究中创建了两种表型算法,一种用于区分正常结肠镜检查病例和异常结肠镜检查对照,另一种用于识别阿司匹林暴露。估计的 PPV 和 NPV 均达到 0.970,95%置信区间下限为 0.915,估计的敏感性为 0.963,特异性为 0.975,用于识别正常结肠镜检查病例。用于识别阿司匹林暴露的表型算法达到了 0.990 的 PPV(95%下限为 0.950)、0.980 的 NPV(95%下限为 0.930),敏感性和特异性分别为 0.960 和 1.000。

结论

可以成功实施一种用于从大规模 EHR 数据中前瞻性开发和验证表型算法的结构化方法,应考虑使用这种方法来提高“大数据”研究的质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee58/9615095/18dfced9fd71/nihms-1841350-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验