用于高维风险预测的代理辅助半监督推理

Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction.

作者信息

Hou Jue, Guo Zijian, Cai Tianxi

机构信息

Division of Biostatistics, University of Minnesota School of Public Health, Minneapolis, MN 55455, USA.

Department of Statistics, Rutgers University, Piscataway, NJ 08854-8019, USA.

出版信息

J Mach Learn Res. 2023 Jan-Dec;24.

PMID:38500567

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10947223/

Abstract

Risk modeling with electronic health records (EHR) data is challenging due to no direct observations of the disease outcome and the high-dimensional predictors. In this paper, we develop a surrogate assisted semi-supervised learning approach, leveraging small labeled data with annotated outcomes and extensive unlabeled data of outcome surrogates and high-dimensional predictors. We propose to impute the unobserved outcomes by constructing a sparse imputation model with outcome surrogates and high-dimensional predictors. We further conduct a one-step bias correction to enable interval estimation for the risk prediction. Our inference procedure is valid even if both the imputation and risk prediction models are misspecified. Our novel way of ultilizing unlabelled data enables the high-dimensional statistical inference for the challenging setting with a dense risk prediction model. We present an extensive simulation study to demonstrate the superiority of our approach compared to existing supervised methods. We apply the method to genetic risk prediction of type-2 diabetes mellitus using an EHR biobank cohort.

摘要

利用电子健康记录（EHR）数据进行风险建模具有挑战性，这是由于无法直接观察疾病结局以及预测变量具有高维度性。在本文中，我们开发了一种替代辅助半监督学习方法，利用带有注释结局的少量标记数据以及结局替代指标和高维度预测变量的大量未标记数据。我们建议通过构建一个包含结局替代指标和高维度预测变量的稀疏插补模型来插补未观察到的结局。我们进一步进行一步偏差校正，以实现风险预测的区间估计。即使插补模型和风险预测模型都设定错误，我们的推断过程仍然有效。我们利用未标记数据的新颖方法能够在具有密集风险预测模型的具有挑战性的环境中进行高维统计推断。我们进行了广泛的模拟研究，以证明我们的方法相对于现有监督方法的优越性。我们将该方法应用于使用EHR生物样本队列对2型糖尿病进行遗传风险预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/076d/10947223/2b7a0ff12e0e/nihms-1971733-f0001.jpg

相似文献

Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction.

J Mach Learn Res. 2023 Jan-Dec;24.

Semi-supervised Double Deep Learning Temporal Risk Prediction (SeDDLeR) with Electronic Health Records.

J Biomed Inform. 2024 Sep;157:104685. doi: 10.1016/j.jbi.2024.104685. Epub 2024 Jul 14.

Robust and efficient semi-supervised estimation of average treatment effects with application to electronic health records data.

Biometrics. 2021 Jun;77(2):413-423. doi: 10.1111/biom.13298. Epub 2020 May 25.

Weakly Semi-supervised phenotyping using Electronic Health records.

J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.

Semi-supervised estimation of covariance with application to phenome-wide association studies with electronic medical records data.

Stat Methods Med Res. 2020 Feb;29(2):455-465. doi: 10.1177/0962280219837676. Epub 2019 Apr 3.

Efficient Evaluation of Prediction Rules in Semi-Supervised Settings under Stratified Sampling.

J R Stat Soc Series B Stat Methodol. 2022 Sep;84(4):1353-1391. doi: 10.1111/rssb.12502. Epub 2022 Apr 26.

Semi-supervised approach to event time annotation using longitudinal electronic health records.

Lifetime Data Anal. 2022 Jul;28(3):428-491. doi: 10.1007/s10985-022-09557-5. Epub 2022 Jun 26.

Semi-supervised calibration of noisy event risk (SCANER) with electronic health records.

J Biomed Inform. 2023 Aug;144:104425. doi: 10.1016/j.jbi.2023.104425. Epub 2023 Jun 16.

Leveraging permutation testing to assess confidence in positive-unlabeled learning applied to high-dimensional biological datasets.

BMC Bioinformatics. 2024 Jun 19;25(1):218. doi: 10.1186/s12859-024-05834-2.

Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping.

J Mach Learn Res. 2022;23.

引用本文的文献

Semi-supervised Triply Robust Inductive Transfer Learning.

J Am Stat Assoc. 2025;120:1037-1047. doi: 10.1080/01621459.2024.2393463. Epub 2024 Oct 10.

Advancing the Use of Longitudinal Electronic Health Records: Tutorial for Uncovering Real-World Evidence in Chronic Disease Outcomes.

J Med Internet Res. 2025 May 12;27:e71873. doi: 10.2196/71873.

本文引用的文献

Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping.

J Mach Learn Res. 2022;23.

Inference for the Case Probability in High-dimensional Logistic Regression.

J Mach Learn Res. 2021;22.

Global and Simultaneous Hypothesis Testing for High-Dimensional Logistic Regression Models.

J Am Stat Assoc. 2021;116(534):984-998. doi: 10.1080/01621459.2019.1699421. Epub 2020 Jan 21.

Discovery of 318 new risk loci for type 2 diabetes and related vascular outcomes among 1.4 million participants in a multi-ancestry meta-analysis.

Nat Genet. 2020 Jul;52(7):680-691. doi: 10.1038/s41588-020-0637-y. Epub 2020 Jun 15.

Robust and efficient semi-supervised estimation of average treatment effects with application to electronic health records data.

Biometrics. 2021 Jun;77(2):413-423. doi: 10.1111/biom.13298. Epub 2020 May 25.

High-throughput multimodal automated phenotyping (MAP) with application to PheWAS.

J Am Med Inform Assoc. 2019 Nov 1;26(11):1255-1262. doi: 10.1093/jamia/ocz066.

Automated feature selection of predictors in electronic medical records data.

Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.

Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps.

Nat Genet. 2018 Nov;50(11):1505-1513. doi: 10.1038/s41588-018-0241-6. Epub 2018 Oct 8.

An Expanded Genome-Wide Association Study of Type 2 Diabetes in Europeans.

Diabetes. 2017 Nov;66(11):2888-2902. doi: 10.2337/db16-1253. Epub 2017 May 31.

Linking electronic health records to better understand breast cancer patient pathways within and between two health systems.

EGEMS (Wash DC). 2015 Mar 4;3(1):1127. doi: 10.13063/2327-9214.1127. eCollection 2015.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于高维风险预测的代理辅助半监督推理

Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献