• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于电子健康记录表型分析的先验自适应半监督学习

Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping.

作者信息

Zhang Yichi, Liu Molei, Neykov Matey, Cai Tianxi

机构信息

Department of Computer Science and Statistics, University of Rhode Island.

Department of Biostatistics, Harvard T.H. Chan School of Public Health.

出版信息

J Mach Learn Res. 2022;23.

PMID:37974910
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10653017/
Abstract

Electronic Health Record (EHR) data, a rich source for biomedical research, have been successfully used to gain novel insight into a wide range of diseases. Despite its potential, EHR is currently underutilized for discovery research due to its major limitation in the lack of precise phenotype information. To overcome such difficulties, recent efforts have been devoted to developing supervised algorithms to accurately predict phenotypes based on relatively small training datasets with gold standard labels extracted via chart review. However, supervised methods typically require a sizable training set to yield generalizable algorithms, especially when the number of candidate features, , is large. In this paper, we propose a semi-supervised (SS) EHR phenotyping method that borrows information from both a small, labeled dataset (where both the label and the feature set are observed) and a much larger, weakly-labeled dataset in which the feature set is accompanied only by a surrogate label that is available to all patients. Under a prior assumption that is related to only through and allowing it to hold , we propose a prior adaptive semi-supervised (PASS) estimator that incorporates the prior knowledge by shrinking the estimator towards a direction derived under the prior. We derive asymptotic theory for the proposed estimator and justify its efficiency and robustness to prior information of poor quality. We also demonstrate its superiority over existing estimators under various scenarios via simulation studies and on three real-world EHR phenotyping studies at a large tertiary hospital.

摘要

电子健康记录(EHR)数据是生物医学研究的丰富来源,已成功用于深入了解多种疾病。尽管具有潜力,但由于缺乏精确的表型信息这一主要限制,EHR目前在发现性研究中的利用不足。为克服这些困难,最近的努力致力于开发监督算法,以便基于通过图表审查提取的具有金标准标签的相对较小的训练数据集准确预测表型。然而,监督方法通常需要相当大的训练集才能产生可推广的算法,特别是当候选特征的数量(p)很大时。在本文中,我们提出了一种半监督(SS)EHR表型分析方法,该方法从一个小的、有标签的数据集(其中标签(Y)和特征集(X)都可观察到)和一个大得多的、弱标签数据集借用信息,在该弱标签数据集中,特征集(X)仅伴随着所有患者都可用的替代标签(Z)。在(Y)仅通过(X)与(Z)相关且允许其成立的先验假设下,我们提出了一种先验自适应半监督(PASS)估计器,该估计器通过将估计器朝着在先验下导出的方向收缩来纳入先验知识。我们推导了所提出估计器的渐近理论,并证明了其对质量较差的先验信息的效率和稳健性。我们还通过模拟研究以及在一家大型三级医院进行的三项真实世界EHR表型分析研究,证明了它在各种情况下优于现有估计器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c443/10653017/140db56dd402/nihms-1912660-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c443/10653017/2e5824e00828/nihms-1912660-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c443/10653017/50476dcd8a34/nihms-1912660-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c443/10653017/eee5473c6717/nihms-1912660-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c443/10653017/65c8bdcb7388/nihms-1912660-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c443/10653017/3c7f58b3b94f/nihms-1912660-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c443/10653017/140db56dd402/nihms-1912660-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c443/10653017/2e5824e00828/nihms-1912660-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c443/10653017/50476dcd8a34/nihms-1912660-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c443/10653017/eee5473c6717/nihms-1912660-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c443/10653017/65c8bdcb7388/nihms-1912660-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c443/10653017/3c7f58b3b94f/nihms-1912660-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c443/10653017/140db56dd402/nihms-1912660-f0006.jpg

相似文献

1
Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping.用于电子健康记录表型分析的先验自适应半监督学习
J Mach Learn Res. 2022;23.
2
Weakly Semi-supervised phenotyping using Electronic Health records.基于电子健康记录的弱监督表型研究
J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.
3
Automated feature selection of predictors in electronic medical records data.电子病历数据中预测指标的自动特征选择
Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.
4
Semi-supervised Double Deep Learning Temporal Risk Prediction (SeDDLeR) with Electronic Health Records.基于电子健康记录的半监督双深度学习时间风险预测(SeDDLeR)
J Biomed Inform. 2024 Sep;157:104685. doi: 10.1016/j.jbi.2024.104685. Epub 2024 Jul 14.
5
Semi-supervised calibration of noisy event risk (SCANER) with electronic health records.基于电子健康记录的带噪事件风险的半监督校准(SCANER)。
J Biomed Inform. 2023 Aug;144:104425. doi: 10.1016/j.jbi.2023.104425. Epub 2023 Jun 16.
6
Semi-supervised ROC analysis for reliable and streamlined evaluation of phenotyping algorithms.半监督 ROC 分析用于可靠且精简的表型算法评估。
J Am Med Inform Assoc. 2024 Feb 16;31(3):640-650. doi: 10.1093/jamia/ocad226.
7
A semi-supervised adaptive Markov Gaussian embedding process (SAMGEP) for prediction of phenotype event times using the electronic health record.基于电子健康记录的表型事件时间预测的半监督自适应马尔可夫高斯嵌入过程 (SAMGEP)。
Sci Rep. 2022 Oct 22;12(1):17737. doi: 10.1038/s41598-022-22585-3.
8
Robust and efficient semi-supervised estimation of average treatment effects with application to electronic health records data.具有应用于电子健康记录数据的稳健且高效的平均处理效应的半监督估计。
Biometrics. 2021 Jun;77(2):413-423. doi: 10.1111/biom.13298. Epub 2020 May 25.
9
Semi-supervised approach to event time annotation using longitudinal electronic health records.基于纵向电子健康记录的事件时间标注的半监督方法。
Lifetime Data Anal. 2022 Jul;28(3):428-491. doi: 10.1007/s10985-022-09557-5. Epub 2022 Jun 26.
10
MixEHR-Guided: A guided multi-modal topic modeling approach for large-scale automatic phenotyping using the electronic health record.混合 EHR 引导:一种使用电子健康记录进行大规模自动表型分析的引导式多模态主题建模方法。
J Biomed Inform. 2022 Oct;134:104190. doi: 10.1016/j.jbi.2022.104190. Epub 2022 Sep 1.

引用本文的文献

1
Semi-supervised Triply Robust Inductive Transfer Learning.半监督三重稳健归纳迁移学习
J Am Stat Assoc. 2025;120:1037-1047. doi: 10.1080/01621459.2024.2393463. Epub 2024 Oct 10.
2
PATIENT RECRUITMENT USING ELECTRONIC HEALTH RECORDS UNDER SELECTION BIAS: A TWO-PHASE SAMPLING FRAMEWORK.在选择偏倚下利用电子健康记录进行患者招募:一种两阶段抽样框架
Ann Appl Stat. 2024 Sep;18(3):1858-1878. doi: 10.1214/23-aoas1860. Epub 2024 Aug 5.
3
Estimation and Inference for High-Dimensional Generalized Linear Models with Knowledge Transfer.

本文引用的文献

1
High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP).使用一种常见的半监督方法(PheCAP)对电子病历数据进行高通量表型分析。
Nat Protoc. 2019 Dec;14(12):3426-3444. doi: 10.1038/s41596-019-0227-6. Epub 2019 Nov 20.
2
A maximum likelihood approach to electronic health record phenotyping using positive and unlabeled patients.一种使用阳性和未标记患者进行电子健康记录表型分析的最大似然方法。
J Am Med Inform Assoc. 2020 Jan 1;27(1):119-126. doi: 10.1093/jamia/ocz170.
3
Semi-supervised validation of multiple surrogate outcomes with application to electronic medical records phenotyping.
具有知识转移的高维广义线性模型的估计与推断
J Am Stat Assoc. 2024;119(546):1274-1285. doi: 10.1080/01621459.2023.2184373. Epub 2023 Apr 12.
4
Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction.用于高维风险预测的代理辅助半监督推理
J Mach Learn Res. 2023 Jan-Dec;24.
多替代结局的半监督验证及其在电子病历表型分析中的应用
Biometrics. 2019 Mar;75(1):78-89. doi: 10.1111/biom.12971. Epub 2019 Mar 8.
4
Association of Interleukin 6 Receptor Variant With Cardiovascular Disease Effects of Interleukin 6 Receptor Blocking Therapy: A Phenome-Wide Association Study.白细胞介素 6 受体变异与心血管疾病的关联:白细胞介素 6 受体阻断治疗的表型全基因组关联研究。
JAMA Cardiol. 2018 Sep 1;3(9):849-857. doi: 10.1001/jamacardio.2018.2287.
5
Snorkel: Rapid Training Data Creation with Weak Supervision.Snorkel:通过弱监督快速创建训练数据
Proceedings VLDB Endowment. 2017 Nov;11(3):269-282. doi: 10.14778/3157794.3157797.
6
Variable Selection with Prior Information for Generalized Linear Models via the Prior LASSO Method.通过先验套索方法对广义线性模型进行带先验信息的变量选择
J Am Stat Assoc. 2016;111(513):355-376. doi: 10.1080/01621459.2015.1008363. Epub 2016 May 5.
7
Learning statistical models of phenotypes using noisy labeled training data.使用带有噪声标签的训练数据学习表型的统计模型。
J Am Med Inform Assoc. 2016 Nov;23(6):1166-1173. doi: 10.1093/jamia/ocw028. Epub 2016 May 12.
8
Electronic medical record phenotyping using the anchor and learn framework.使用锚定与学习框架进行电子病历表型分析。
J Am Med Inform Assoc. 2016 Jul;23(4):731-40. doi: 10.1093/jamia/ocw011. Epub 2016 Apr 23.
9
Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.迈向高通量表型分析:从知识源中进行无偏自动特征提取与选择。
J Am Med Inform Assoc. 2015 Sep;22(5):993-1000. doi: 10.1093/jamia/ocv034. Epub 2015 Apr 29.
10
Development of phenotype algorithms using electronic medical records and incorporating natural language processing.利用电子病历并结合自然语言处理开发表型算法。
BMJ. 2015 Apr 24;350:h1885. doi: 10.1136/bmj.h1885.