半监督 ROC 分析用于可靠且精简的表型算法评估。

Semi-supervised ROC analysis for reliable and streamlined evaluation of phenotyping algorithms.

机构信息

Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada.

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.

出版信息

J Am Med Inform Assoc. 2024 Feb 16;31(3):640-650. doi: 10.1093/jamia/ocad226.

DOI:10.1093/jamia/ocad226

PMID:38128118

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10873838/

Abstract

OBJECTIVE

High-throughput phenotyping will accelerate the use of electronic health records (EHRs) for translational research. A critical roadblock is the extensive medical supervision required for phenotyping algorithm (PA) estimation and evaluation. To address this challenge, numerous weakly-supervised learning methods have been proposed. However, there is a paucity of methods for reliably evaluating the predictive performance of PAs when a very small proportion of the data is labeled. To fill this gap, we introduce a semi-supervised approach (ssROC) for estimation of the receiver operating characteristic (ROC) parameters of PAs (eg, sensitivity, specificity).

MATERIALS AND METHODS

ssROC uses a small labeled dataset to nonparametrically impute missing labels. The imputations are then used for ROC parameter estimation to yield more precise estimates of PA performance relative to classical supervised ROC analysis (supROC) using only labeled data. We evaluated ssROC with synthetic, semi-synthetic, and EHR data from Mass General Brigham (MGB).

RESULTS

ssROC produced ROC parameter estimates with minimal bias and significantly lower variance than supROC in the simulated and semi-synthetic data. For the 5 PAs from MGB, the estimates from ssROC are 30% to 60% less variable than supROC on average.

DISCUSSION

ssROC enables precise evaluation of PA performance without demanding large volumes of labeled data. ssROC is also easily implementable in open-source R software.

CONCLUSION

When used in conjunction with weakly-supervised PAs, ssROC facilitates the reliable and streamlined phenotyping necessary for EHR-based research.

摘要

目的

高通量表型分析将加速电子健康记录（EHR）在转化研究中的应用。一个关键的障碍是表型分析算法（PA）估计和评估所需的广泛医疗监督。为了解决这一挑战，已经提出了许多弱监督学习方法。然而，当只有一小部分数据被标记时，很少有方法可以可靠地评估 PA 的预测性能。为了填补这一空白，我们引入了一种半监督方法（ssROC）来估计 PA 的接收者操作特征（ROC）参数（例如，灵敏度、特异性）。

材料和方法

ssROC 使用一个小的标记数据集进行非参数化的缺失标签推断。然后，这些推断用于 ROC 参数估计，以产生比仅使用标记数据的经典监督 ROC 分析（supROC）更精确的 PA 性能估计。我们使用来自麻省总医院（MGB）的合成、半合成和 EHR 数据评估了 ssROC。

结果

ssROC 在模拟和半合成数据中产生了具有最小偏差和显著更低方差的 ROC 参数估计，比 supROC 更精确。对于来自 MGB 的 5 个 PA，ssROC 的估计值平均比 supROC 变化小 30%至 60%。

讨论

ssROC 可以在不需要大量标记数据的情况下，精确评估 PA 的性能。ssROC 也可以在开源 R 软件中轻松实现。

结论

当与弱监督 PA 一起使用时，ssROC 有助于实现基于 EHR 的研究所需的可靠和简化的表型分析。

相似文献

Semi-supervised ROC analysis for reliable and streamlined evaluation of phenotyping algorithms.

J Am Med Inform Assoc. 2024 Feb 16;31(3):640-650. doi: 10.1093/jamia/ocad226.

Weakly Semi-supervised phenotyping using Electronic Health records.

J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.

Semi-supervised Double Deep Learning Temporal Risk Prediction (SeDDLeR) with Electronic Health Records.

J Biomed Inform. 2024 Sep;157:104685. doi: 10.1016/j.jbi.2024.104685. Epub 2024 Jul 14.

High-throughput phenotyping with temporal sequences.

J Am Med Inform Assoc. 2021 Mar 18;28(4):772-781. doi: 10.1093/jamia/ocaa288.

sureLDA: A multidisease automated phenotyping method for the electronic health record.

J Am Med Inform Assoc. 2020 Aug 1;27(8):1235-1243. doi: 10.1093/jamia/ocaa079.

Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries.

J Biomed Inform. 2019 Nov;99:103310. doi: 10.1016/j.jbi.2019.103310. Epub 2019 Oct 14.

Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping.

J Mach Learn Res. 2022;23.

Semi-supervised Learning for Phenotyping Tasks.

AMIA Annu Symp Proc. 2015 Nov 5;2015:502-11. eCollection 2015.

Automated feature selection of predictors in electronic medical records data.

Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.

Semi-supervised validation of multiple surrogate outcomes with application to electronic medical records phenotyping.

Biometrics. 2019 Mar;75(1):78-89. doi: 10.1111/biom.12971. Epub 2019 Mar 8.

本文引用的文献

Machine learning approaches for electronic health records phenotyping: a methodical review.

J Am Med Inform Assoc. 2023 Jan 18;30(2):367-381. doi: 10.1093/jamia/ocac216.

Efficient Evaluation of Prediction Rules in Semi-Supervised Settings under Stratified Sampling.

J R Stat Soc Series B Stat Methodol. 2022 Sep;84(4):1353-1391. doi: 10.1111/rssb.12502. Epub 2022 Apr 26.

Weakly Semi-supervised phenotyping using Electronic Health records.

J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.

PheValuator 2.0: Methodological improvements for the PheValuator approach to semi-automated phenotype algorithm evaluation.

J Biomed Inform. 2022 Nov;135:104177. doi: 10.1016/j.jbi.2022.104177. Epub 2022 Aug 19.

Factors driving provider adoption of the TREWS machine learning-based early warning system and its effects on sepsis treatment timing.

Nat Med. 2022 Jul;28(7):1447-1454. doi: 10.1038/s41591-022-01895-z. Epub 2022 Jul 21.

Unraveling COVID-19: A Large-Scale Characterization of 4.5 Million COVID-19 Cases Using CHARYBDIS.

Clin Epidemiol. 2022 Mar 22;14:369-384. doi: 10.2147/CLEP.S323292. eCollection 2022.

A cost-effective chart review sampling design to account for phenotyping error in electronic health records (EHR) data.

J Am Med Inform Assoc. 2021 Dec 28;29(1):52-61. doi: 10.1093/jamia/ocab222.

Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS.

Annu Rev Biomed Data Sci. 2021 Jul 20;4:1-19. doi: 10.1146/annurev-biodatasci-122320-112352.

Automated detection of substance use information from electronic health records for a pediatric population.

J Am Med Inform Assoc. 2021 Sep 18;28(10):2116-2127. doi: 10.1093/jamia/ocab116.

ChartSweep: A HIPAA-compliant Tool to Automate Chart Review for Plastic Surgery Research.

Plast Reconstr Surg Glob Open. 2021 Jun 15;9(6):e3633. doi: 10.1097/GOX.0000000000003633. eCollection 2021 Jun.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

半监督 ROC 分析用于可靠且精简的表型算法评估。

Semi-supervised ROC analysis for reliable and streamlined evaluation of phenotyping algorithms.

机构信息

Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada.

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.

出版信息

J Am Med Inform Assoc. 2024 Feb 16;31(3):640-650. doi: 10.1093/jamia/ocad226.

DOI:10.1093/jamia/ocad226

PMID:38128118

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10873838/

Abstract

OBJECTIVE

MATERIALS AND METHODS

RESULTS

DISCUSSION

ssROC enables precise evaluation of PA performance without demanding large volumes of labeled data. ssROC is also easily implementable in open-source R software.

CONCLUSION

When used in conjunction with weakly-supervised PAs, ssROC facilitates the reliable and streamlined phenotyping necessary for EHR-based research.

摘要

目的

材料和方法

结果

讨论

ssROC 可以在不需要大量标记数据的情况下，精确评估 PA 的性能。ssROC 也可以在开源 R 软件中轻松实现。

结论

当与弱监督 PA 一起使用时，ssROC 有助于实现基于 EHR 的研究所需的可靠和简化的表型分析。

半监督 ROC 分析用于可靠且精简的表型算法评估。

Semi-supervised ROC analysis for reliable and streamlined evaluation of phenotyping algorithms.

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

DISCUSSION

CONCLUSION

目的

材料和方法

结果

讨论

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

半监督 ROC 分析用于可靠且精简的表型算法评估。

Semi-supervised ROC analysis for reliable and streamlined evaluation of phenotyping algorithms.

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

DISCUSSION

CONCLUSION

目的

材料和方法

结果

讨论

结论