Suppr超能文献

用于表型分析任务的半监督学习

Semi-supervised Learning for Phenotyping Tasks.

作者信息

Dligach Dmitriy, Miller Timothy, Savova Guergana K

机构信息

Boston Children's Hospital and Harvard Medical School, Boston, MA.

出版信息

AMIA Annu Symp Proc. 2015 Nov 5;2015:502-11. eCollection 2015.

Abstract

Supervised learning is the dominant approach to automatic electronic health records-based phenotyping, but it is expensive due to the cost of manual chart review. Semi-supervised learning takes advantage of both scarce labeled and plentiful unlabeled data. In this work, we study a family of semi-supervised learning algorithms based on Expectation Maximization (EM) in the context of several phenotyping tasks. We first experiment with the basic EM algorithm. When the modeling assumptions are violated, basic EM leads to inaccurate parameter estimation. Augmented EM attenuates this shortcoming by introducing a weighting factor that downweights the unlabeled data. Cross-validation does not always lead to the best setting of the weighting factor and other heuristic methods may be preferred. We show that accurate phenotyping models can be trained with only a few hundred labeled (and a large number of unlabeled) examples, potentially providing substantial savings in the amount of the required manual chart review.

摘要

监督学习是基于自动电子健康记录进行表型分析的主要方法,但由于人工病历审查成本高昂,该方法成本较高。半监督学习利用了少量的标记数据和大量的未标记数据。在这项工作中,我们在几个表型分析任务的背景下,研究了一系列基于期望最大化(EM)的半监督学习算法。我们首先对基本的EM算法进行实验。当建模假设不成立时,基本的EM算法会导致参数估计不准确。增强EM算法通过引入一个对未标记数据进行加权的加权因子来减轻这一缺点。交叉验证并不总是能得到加权因子的最佳设置,其他启发式方法可能更可取。我们表明,仅用几百个标记(以及大量未标记)示例就可以训练出准确的表型分析模型,这有可能大幅节省所需的人工病历审查工作量。

相似文献

1
Semi-supervised Learning for Phenotyping Tasks.用于表型分析任务的半监督学习
AMIA Annu Symp Proc. 2015 Nov 5;2015:502-11. eCollection 2015.
2
Weakly Semi-supervised phenotyping using Electronic Health records.基于电子健康记录的弱监督表型研究
J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.
6
SemiBoost: boosting for semi-supervised learning.半增强算法:用于半监督学习的增强算法
IEEE Trans Pattern Anal Mach Intell. 2009 Nov;31(11):2000-14. doi: 10.1109/TPAMI.2008.235.

引用本文的文献

2
Pre-training phenotyping classifiers.预训练表型分类器。
J Biomed Inform. 2021 Jan;113:103626. doi: 10.1016/j.jbi.2020.103626. Epub 2020 Nov 28.

本文引用的文献

1
Patient-level temporal aggregation for text-based asthma status ascertainment.基于文本的哮喘状态确定的患者级时间聚合。
J Am Med Inform Assoc. 2014 Sep-Oct;21(5):876-84. doi: 10.1136/amiajnl-2013-002463. Epub 2014 May 15.
9
What to expect from the Pharmacogenomics Research Network.药物基因组学研究网络的预期成果。
Clin Pharmacol Ther. 2011 Mar;89(3):339-41. doi: 10.1038/clpt.2010.293.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验