Suppr超能文献

基于电子健康记录的带噪事件风险的半监督校准(SCANER)。

Semi-supervised calibration of noisy event risk (SCANER) with electronic health records.

机构信息

Duke University, Durham, NC, USA; Harvard Medical School, Boston, MA, USA.

Harvard T.H. Chan School of Public Health, Boston, MA, USA.

出版信息

J Biomed Inform. 2023 Aug;144:104425. doi: 10.1016/j.jbi.2023.104425. Epub 2023 Jun 16.

Abstract

OBJECTIVE

Electronic health records (EHR), containing detailed longitudinal clinical information on a large number of patients and covering broad patient populations, open opportunities for comprehensive predictive modeling of disease progression and treatment response. However, since EHRs were originally constructed for administrative purposes not for research, in the EHR-linked studies, it is often not feasible to capture reliable information for analytical variables, especially in the survival setting, when both accurate event status and event times are needed for model building. For example, progression-free survival (PFS), a commonly used survival outcome for cancer patients, often involves complex information embedded in free-text clinical notes and cannot be extracted reliably. Proxies of PFS time such as time to the first mention of progression in the notes are at best good approximations to the true event time. This leads to difficulty in efficiently estimating event rates for an EHR patient cohort. Estimating survival rates based on error-prone outcome definitions can lead to biased results and hamper the power in the downstream analysis. On the other hand, extracting accurate event time information via manual annotation is time and resource intensive. The objective of this study is to develop a calibrated survival rate estimator using noisy outcomes from EHR data.

MATERIALS AND METHODS

In this paper, we propose a two-stage semi-supervised calibration of noisy event rate (SCANER) estimator that can effectively overcome censoring induced dependency and attains more robust performance (i.e., not sensitive to misspecification of the imputation model) by fully utilizing both a small-labeled set of gold-standard survival outcomes annotated via manual chart review and a set of proxy features automatically captured via EHR in the unlabeled set. We validate the SCANER estimator by estimating the PFS rates for a virtual cohort of lung cancer patients from one large tertiary care center and the ICU-free survival rates for COVID patients from two large tertiary care centers.

RESULTS

In terms of survival rate estimates, the SCANER had very similar point estimates compared to the complete-case Kaplan Meier estimator. On the other hand, other benchmark methods for comparison, which fail to account for the induced dependency between event time and the censoring time conditioning on surrogate outcomes, produced biased results across all three case studies. In terms of standard errors, the SCANER estimator was more efficient than the KM estimator, with up to 50% efficiency gain.

CONCLUSION

The SCANER estimator achieves more efficient, robust, and accurate survival rate estimates compared to existing approaches. This promising new approach can also improve the resolution (i.e., granularity of event time) by using labels conditioning on multiple surrogates, particularly among less common or poorly coded conditions.

摘要

目的

电子健康记录(EHR)包含大量患者的详细纵向临床信息,涵盖广泛的患者群体,为疾病进展和治疗反应的综合预测建模提供了机会。然而,由于 EHR 最初是为管理目的而不是为研究而构建的,因此在 EHR 相关研究中,通常无法为分析变量捕获可靠的信息,尤其是在生存设置中,此时需要准确的事件状态和事件时间来进行模型构建。例如,无进展生存期(PFS)是癌症患者常用的生存结局,它通常涉及到自由文本临床记录中嵌入的复杂信息,并且无法可靠地提取。PFS 时间的代理,例如记录中首次提到进展的时间,最多只是对真实事件时间的良好近似。这导致难以有效地估计 EHR 患者队列的事件发生率。基于错误百出的结局定义来估计生存率可能会导致有偏结果并阻碍下游分析的功效。另一方面,通过手动注释提取准确的事件时间信息既费时又费资源。本研究的目的是开发一种使用 EHR 数据中嘈杂结局的校准生存率估计器。

材料和方法

在本文中,我们提出了一种两阶段半监督校准噪声事件率(SCANER)估计器,该估计器可以通过充分利用通过手动图表审查注释的小标签集的黄金标准生存结局和通过 EHR 自动捕获的代理特征集,有效地克服由有偏的协变量引起的依赖性,并获得更稳健的性能(即,不受插补模型的错误指定的影响)。我们通过从一家大型三级护理中心估计虚拟的肺癌患者队列的 PFS 率和从两家大型三级护理中心估计 COVID 患者的 ICU 无生存时间率,来验证 SCANER 估计器。

结果

在生存率估计方面,SCANER 的点估计与完整病例 Kaplan-Meier 估计器非常相似。另一方面,其他用于比较的基准方法,由于未能考虑到事件时间与基于替代结果的截尾时间之间的诱导依赖性,因此在所有三个案例研究中都产生了有偏的结果。在标准误差方面,SCANER 估计器比 KM 估计器更有效,最高可达 50%的效率增益。

结论

与现有方法相比,SCANER 估计器可实现更高效、稳健和准确的生存率估计。这种有前景的新方法还可以通过使用多个替代物的标签来提高分辨率(即事件时间的粒度),尤其是在不太常见或编码较差的情况下。

相似文献

1
Semi-supervised calibration of noisy event risk (SCANER) with electronic health records.
J Biomed Inform. 2023 Aug;144:104425. doi: 10.1016/j.jbi.2023.104425. Epub 2023 Jun 16.
2
Semisupervised Calibration of Risk with Noisy Event Times (SCORNET) using electronic health record data.
Biostatistics. 2023 Jul 14;24(3):760-775. doi: 10.1093/biostatistics/kxac003.
3
Semi-supervised Double Deep Learning Temporal Risk Prediction (SeDDLeR) with Electronic Health Records.
J Biomed Inform. 2024 Sep;157:104685. doi: 10.1016/j.jbi.2024.104685. Epub 2024 Jul 14.
5
Semi-supervised approach to event time annotation using longitudinal electronic health records.
Lifetime Data Anal. 2022 Jul;28(3):428-491. doi: 10.1007/s10985-022-09557-5. Epub 2022 Jun 26.
6
Weakly Semi-supervised phenotyping using Electronic Health records.
J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.
8
Automated feature selection of predictors in electronic medical records data.
Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.
9
[Standard technical specifications for methacholine chloride (Methacholine) bronchial challenge test (2023)].
Zhonghua Jie He He Hu Xi Za Zhi. 2024 Feb 12;47(2):101-119. doi: 10.3760/cma.j.cn112147-20231019-00247.

引用本文的文献

1
Artificial Intelligence-Based Methods: The Path Forward in Achieving Equity in Lung Cancer Screening and Evaluation.
Cancer Innov. 2025 Jun 20;4(4):e70019. doi: 10.1002/cai2.70019. eCollection 2025 Aug.
2
Deep learning with noisy labels in medical prediction problems: a scoping review.
J Am Med Inform Assoc. 2024 Jun 20;31(7):1596-1607. doi: 10.1093/jamia/ocae108.

本文引用的文献

1
Efficient Evaluation of Prediction Rules in Semi-Supervised Settings under Stratified Sampling.
J R Stat Soc Series B Stat Methodol. 2022 Sep;84(4):1353-1391. doi: 10.1111/rssb.12502. Epub 2022 Apr 26.
3
Introducing the FAIR Principles for research software.
Sci Data. 2022 Oct 14;9(1):622. doi: 10.1038/s41597-022-01710-x.
5
Optimal Designs of Two-Phase Studies.
J Am Stat Assoc. 2020;115(532):1946-1959. doi: 10.1080/01621459.2019.1671200. Epub 2019 Oct 29.
8
High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP).
Nat Protoc. 2019 Dec;14(12):3426-3444. doi: 10.1038/s41596-019-0227-6. Epub 2019 Nov 20.
9
Determining the Time of Cancer Recurrence Using Claims or Electronic Medical Record Data.
JCO Clin Cancer Inform. 2018 Dec;2:1-10. doi: 10.1200/CCI.17.00163.
10
DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network.
BMC Med Res Methodol. 2018 Feb 26;18(1):24. doi: 10.1186/s12874-018-0482-1.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验