Suppr超能文献

将主动学习应用于电子健康记录数据的高通量表型算法。

Applying active learning to high-throughput phenotyping algorithms for electronic health records data.

机构信息

Department of Biomedical Informatics, Vanderbilt University, School of Medicine, Nashville, Tennessee, USA.

出版信息

J Am Med Inform Assoc. 2013 Dec;20(e2):e253-9. doi: 10.1136/amiajnl-2013-001945. Epub 2013 Jul 13.

Abstract

OBJECTIVES

Generalizable, high-throughput phenotyping methods based on supervised machine learning (ML) algorithms could significantly accelerate the use of electronic health records data for clinical and translational research. However, they often require large numbers of annotated samples, which are costly and time-consuming to review. We investigated the use of active learning (AL) in ML-based phenotyping algorithms.

METHODS

We integrated an uncertainty sampling AL approach with support vector machines-based phenotyping algorithms and evaluated its performance using three annotated disease cohorts including rheumatoid arthritis (RA), colorectal cancer (CRC), and venous thromboembolism (VTE). We investigated performance using two types of feature sets: unrefined features, which contained at least all clinical concepts extracted from notes and billing codes; and a smaller set of refined features selected by domain experts. The performance of the AL was compared with a passive learning (PL) approach based on random sampling.

RESULTS

Our evaluation showed that AL outperformed PL on three phenotyping tasks. When unrefined features were used in the RA and CRC tasks, AL reduced the number of annotated samples required to achieve an area under the curve (AUC) score of 0.95 by 68% and 23%, respectively. AL also achieved a reduction of 68% for VTE with an optimal AUC of 0.70 using refined features. As expected, refined features improved the performance of phenotyping classifiers and required fewer annotated samples.

CONCLUSIONS

This study demonstrated that AL can be useful in ML-based phenotyping methods. Moreover, AL and feature engineering based on domain knowledge could be combined to develop efficient and generalizable phenotyping methods.

摘要

目的

基于监督机器学习(ML)算法的可推广、高通量表型方法可以显著加速电子健康记录数据在临床和转化研究中的应用。然而,它们通常需要大量注释样本,这些样本的审查既昂贵又耗时。我们研究了主动学习(AL)在基于 ML 的表型算法中的应用。

方法

我们将不确定性抽样 AL 方法与基于支持向量机的表型算法集成,并使用包括类风湿关节炎(RA)、结直肠癌(CRC)和静脉血栓栓塞(VTE)在内的三个注释疾病队列来评估其性能。我们使用两种类型的特征集来研究性能:未精炼特征集,其中包含从笔记和计费代码中提取的至少所有临床概念;以及由领域专家选择的较小精炼特征集。比较了 AL 与基于随机抽样的被动学习(PL)方法的性能。

结果

我们的评估表明,在三个表型任务中,AL 优于 PL。当在 RA 和 CRC 任务中使用未精炼特征时,AL 将获得 AUC 评分为 0.95 所需的注释样本数量分别减少了 68%和 23%。当使用精炼特征时,AL 还将 VTE 的 AUC 优化为 0.70,减少了 68%。正如预期的那样,精炼特征提高了表型分类器的性能,所需的注释样本数量也更少。

结论

这项研究表明,AL 可用于基于 ML 的表型方法。此外,基于领域知识的 AL 和特征工程可以结合起来开发高效且可推广的表型方法。

相似文献

4
Applying active learning to supervised word sense disambiguation in MEDLINE.将主动学习应用于 MEDLINE 中的监督词义消歧。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):1001-6. doi: 10.1136/amiajnl-2012-001244. Epub 2013 Jan 30.
7
Weakly Semi-supervised phenotyping using Electronic Health records.基于电子健康记录的弱监督表型研究
J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.
8
Feature extraction for phenotyping from semantic and knowledge resources.从语义和知识资源中进行表型特征提取。
J Biomed Inform. 2019 Mar;91:103122. doi: 10.1016/j.jbi.2019.103122. Epub 2019 Feb 7.
9
Development of an automated phenotyping algorithm for hepatorenal syndrome.开发用于肝肾综合征的自动表型算法。
J Biomed Inform. 2018 Apr;80:87-95. doi: 10.1016/j.jbi.2018.03.001. Epub 2018 Mar 9.

引用本文的文献

1
Comprehensive application of artificial intelligence in colorectal cancer: A review.人工智能在结直肠癌中的综合应用:综述
iScience. 2025 Jun 23;28(7):112980. doi: 10.1016/j.isci.2025.112980. eCollection 2025 Jul 18.
10
Defining Phenotypes from Clinical Data to Drive Genomic Research.从临床数据定义表型以推动基因组研究。
Annu Rev Biomed Data Sci. 2018 Jul;1:69-92. doi: 10.1146/annurev-biodatasci-080917-013335. Epub 2018 Apr 25.

本文引用的文献

1
Chapter 13: Mining electronic health records in the genomics era.第十三章:基因组时代的电子健康记录挖掘。
PLoS Comput Biol. 2012;8(12):e1002823. doi: 10.1371/journal.pcbi.1002823. Epub 2012 Dec 27.
2
Next-generation phenotyping of electronic health records.电子健康记录的下一代表型分析。
J Am Med Inform Assoc. 2013 Jan 1;20(1):117-21. doi: 10.1136/amiajnl-2012-001145. Epub 2012 Sep 6.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验