Suppr超能文献

基于电子健康记录数据的带噪声事件时间的半监督风险校准(SCORNET)。

Semisupervised Calibration of Risk with Noisy Event Times (SCORNET) using electronic health record data.

机构信息

Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115, USA.

Department of Statistics, University of California Davis, 1 Shields Avenue, Davis, CA 05616, USA.

出版信息

Biostatistics. 2023 Jul 14;24(3):760-775. doi: 10.1093/biostatistics/kxac003.

Abstract

Leveraging large-scale electronic health record (EHR) data to estimate survival curves for clinical events can enable more powerful risk estimation and comparative effectiveness research. However, use of EHR data is hindered by a lack of direct event time observations. Occurrence times of relevant diagnostic codes or target disease mentions in clinical notes are at best a good approximation of the true disease onset time. On the other hand, extracting precise information on the exact event time requires laborious manual chart review and is sometimes altogether infeasible due to a lack of detailed documentation. Current status labels-binary indicators of phenotype status during follow-up-are significantly more efficient and feasible to compile, enabling more precise survival curve estimation given limited resources. Existing survival analysis methods using current status labels focus almost entirely on supervised estimation, and naive incorporation of unlabeled data into these methods may lead to biased estimates. In this article, we propose Semisupervised Calibration of Risk with Noisy Event Times (SCORNET), which yields a consistent and efficient survival function estimator by leveraging a small set of current status labels and a large set of informative features. In addition to providing theoretical justification of SCORNET, we demonstrate in both simulation and real-world EHR settings that SCORNET achieves efficiency akin to the parametric Weibull regression model, while also exhibiting semi-nonparametric flexibility and relatively low empirical bias in a variety of generative settings.

摘要

利用大规模的电子健康记录 (EHR) 数据来估计临床事件的生存曲线,可以实现更强大的风险估计和比较效果研究。然而,EHR 数据的使用受到缺乏直接事件时间观测的限制。临床记录中相关诊断代码或目标疾病提及的发生时间充其量只是真实疾病发病时间的良好近似。另一方面,提取有关确切事件时间的精确信息需要费力的手动图表审查,并且由于缺乏详细的文档,有时完全不可行。当前的状态标签——随访期间表型状态的二进制指标——在编译时效率更高、可行性更强,在资源有限的情况下可以更精确地估计生存曲线。使用当前状态标签的现有生存分析方法几乎完全专注于有监督估计,而在这些方法中盲目纳入未标记的数据可能会导致有偏差的估计。在本文中,我们提出了 Semisupervised Calibration of Risk with Noisy Event Times (SCORNET),它通过利用一小部分当前状态标签和大量信息丰富的特征,生成一致且高效的生存函数估计器。除了提供 SCORNET 的理论依据外,我们还在模拟和真实 EHR 环境中证明,SCORNET 在各种生成环境中实现了类似于参数 Weibull 回归模型的效率,同时还表现出半非参数灵活性和相对较低的经验偏差。

相似文献

8
Weakly Semi-supervised phenotyping using Electronic Health records.基于电子健康记录的弱监督表型研究
J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.

本文引用的文献

1
Efficient Evaluation of Prediction Rules in Semi-Supervised Settings under Stratified Sampling.分层抽样下半监督设置中预测规则的有效评估
J R Stat Soc Series B Stat Methodol. 2022 Sep;84(4):1353-1391. doi: 10.1111/rssb.12502. Epub 2022 Apr 26.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验