• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

比较基于深度学习和概念提取的方法用于从临床叙述中进行患者表型分析。

Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives.

作者信息

Gehrmann Sebastian, Dernoncourt Franck, Li Yeran, Carlson Eric T, Wu Joy T, Welt Jonathan, Foote John, Moseley Edward T, Grant David W, Tyler Patrick D, Celi Leo A

机构信息

MIT Critical Data, Laboratory for Computational Physiology, Cambridge, MA, United States of America.

Harvard SEAS, Harvard University, Cambridge, MA, United States of America.

出版信息

PLoS One. 2018 Feb 15;13(2):e0192360. doi: 10.1371/journal.pone.0192360. eCollection 2018.

DOI:10.1371/journal.pone.0192360
PMID:29447188
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5813927/
Abstract

In secondary analysis of electronic health records, a crucial task consists in correctly identifying the patient cohort under investigation. In many cases, the most valuable and relevant information for an accurate classification of medical conditions exist only in clinical narratives. Therefore, it is necessary to use natural language processing (NLP) techniques to extract and evaluate these narratives. The most commonly used approach to this problem relies on extracting a number of clinician-defined medical concepts from text and using machine learning techniques to identify whether a particular patient has a certain condition. However, recent advances in deep learning and NLP enable models to learn a rich representation of (medical) language. Convolutional neural networks (CNN) for text classification can augment the existing techniques by leveraging the representation of language to learn which phrases in a text are relevant for a given medical condition. In this work, we compare concept extraction based methods with CNNs and other commonly used models in NLP in ten phenotyping tasks using 1,610 discharge summaries from the MIMIC-III database. We show that CNNs outperform concept extraction based methods in almost all of the tasks, with an improvement in F1-score of up to 26 and up to 7 percentage points in area under the ROC curve (AUC). We additionally assess the interpretability of both approaches by presenting and evaluating methods that calculate and extract the most salient phrases for a prediction. The results indicate that CNNs are a valid alternative to existing approaches in patient phenotyping and cohort identification, and should be further investigated. Moreover, the deep learning approach presented in this paper can be used to assist clinicians during chart review or support the extraction of billing codes from text by identifying and highlighting relevant phrases for various medical conditions.

摘要

在电子健康记录的二次分析中,一项关键任务是正确识别所研究的患者队列。在许多情况下,用于准确分类医疗状况的最有价值和相关性最强的信息仅存在于临床叙述中。因此,有必要使用自然语言处理(NLP)技术来提取和评估这些叙述。解决这个问题最常用的方法是从文本中提取一些临床医生定义的医学概念,并使用机器学习技术来识别特定患者是否患有某种疾病。然而,深度学习和NLP的最新进展使模型能够学习(医学)语言的丰富表示。用于文本分类的卷积神经网络(CNN)可以通过利用语言表示来学习文本中哪些短语与给定的医疗状况相关,从而增强现有技术。在这项工作中,我们在十个表型分析任务中,使用来自MIMIC-III数据库的1610份出院小结,将基于概念提取的方法与CNN以及NLP中其他常用模型进行比较。我们表明,在几乎所有任务中,CNN的表现都优于基于概念提取的方法,F1分数提高了26,ROC曲线下面积(AUC)提高了7个百分点。我们还通过展示和评估计算和提取预测中最突出短语的方法,来评估这两种方法的可解释性。结果表明,在患者表型分析和队列识别中,CNN是现有方法的有效替代方案,应进一步研究。此外,本文提出的深度学习方法可用于在病历审查期间协助临床医生,或通过识别和突出显示各种医疗状况的相关短语来支持从文本中提取计费代码。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab95/5813927/cba4e410d9c9/pone.0192360.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab95/5813927/6bb1344a2b2d/pone.0192360.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab95/5813927/042bb38e70ac/pone.0192360.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab95/5813927/cba4e410d9c9/pone.0192360.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab95/5813927/6bb1344a2b2d/pone.0192360.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab95/5813927/042bb38e70ac/pone.0192360.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab95/5813927/cba4e410d9c9/pone.0192360.g003.jpg

相似文献

1
Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives.比较基于深度学习和概念提取的方法用于从临床叙述中进行患者表型分析。
PLoS One. 2018 Feb 15;13(2):e0192360. doi: 10.1371/journal.pone.0192360. eCollection 2018.
2
A clinical text classification paradigm using weak supervision and deep representation.一种使用弱监督和深度表示的临床文本分类范式。
BMC Med Inform Decis Mak. 2019 Jan 7;19(1):1. doi: 10.1186/s12911-018-0723-6.
3
Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.人工智能通过外部资源学习语义以对出院小结中的诊断代码进行分类。
J Med Internet Res. 2017 Nov 6;19(11):e380. doi: 10.2196/jmir.8344.
4
Ensembles of natural language processing systems for portable phenotyping solutions.用于便携表型解决方案的自然语言处理系统集合。
J Biomed Inform. 2019 Dec;100:103318. doi: 10.1016/j.jbi.2019.103318. Epub 2019 Oct 23.
5
Natural language processing with deep learning for medical adverse event detection from free-text medical narratives: A case study of detecting total hip replacement dislocation.基于深度学习的自然语言处理在从自由文本医疗叙事中检测医疗不良事件中的应用:以检测全髋关节置换脱位为例。
Comput Biol Med. 2021 Feb;129:104140. doi: 10.1016/j.compbiomed.2020.104140. Epub 2020 Nov 24.
6
Deep Learning versus Conventional Machine Learning for Detection of Healthcare-Associated Infections in French Clinical Narratives.深度学习与传统机器学习在法语临床记录中检测医疗相关感染的比较
Methods Inf Med. 2019 Jun;58(1):31-41. doi: 10.1055/s-0039-1677692. Epub 2019 Mar 15.
7
Clinical Text Data in Machine Learning: Systematic Review.机器学习中的临床文本数据:系统综述
JMIR Med Inform. 2020 Mar 31;8(3):e17984. doi: 10.2196/17984.
8
Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives.开发和评估 RapTAT:一种用于从医学叙述中映射短语概念的机器学习系统。
J Biomed Inform. 2014 Apr;48:54-65. doi: 10.1016/j.jbi.2013.11.008. Epub 2013 Dec 4.
9
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
10
Evaluating shallow and deep learning strategies for the 2018 n2c2 shared task on clinical text classification.评估浅层和深度学习策略在 2018 n2c2 临床文本分类共享任务中的应用。
J Am Med Inform Assoc. 2019 Nov 1;26(11):1247-1254. doi: 10.1093/jamia/ocz149.

引用本文的文献

1
Generalizing machine learning models from clinical free text.从临床自由文本中归纳机器学习模型。
Sci Rep. 2025 Aug 28;15(1):31668. doi: 10.1038/s41598-025-17197-6.
2
Role of artificial intelligence in revolutionizing drug discovery.人工智能在变革药物研发中的作用。
Fundam Res. 2024 May 9;5(3):1273-1287. doi: 10.1016/j.fmre.2024.04.021. eCollection 2025 May.
3
A Large Language Model Outperforms Other Computational Approaches to the High-Throughput Phenotyping of Physician Notes.在医生笔记的高通量表型分析中,大型语言模型优于其他计算方法。

本文引用的文献

1
LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks.LSTMVis:用于分析递归神经网络隐藏状态动态的工具。
IEEE Trans Vis Comput Graph. 2018 Jan;24(1):667-676. doi: 10.1109/TVCG.2017.2744158. Epub 2017 Aug 29.
2
"What is relevant in a text document?": An interpretable machine learning approach.“文本文档中的相关内容是什么?”:一种可解释的机器学习方法。
PLoS One. 2017 Aug 11;12(8):e0181142. doi: 10.1371/journal.pone.0181142. eCollection 2017.
3
A study of the transferability of influenza case detection systems between two large healthcare systems.
AMIA Annu Symp Proc. 2025 May 22;2024:838-846. eCollection 2024.
4
Analyzing patient perspectives with large language models: a cross-sectional study of sentiment and thematic classification on exception from informed consent.使用大语言模型分析患者观点:关于知情同意例外情况的情感和主题分类的横断面研究。
Sci Rep. 2025 Feb 20;15(1):6179. doi: 10.1038/s41598-025-89996-w.
5
Utility of word embeddings from large language models in medical diagnosis.来自大语言模型的词嵌入在医学诊断中的效用。
J Am Med Inform Assoc. 2025 Mar 1;32(3):526-534. doi: 10.1093/jamia/ocae314.
6
The Association between All-Cause Mortality and Obstructive Sleep Apnea in Adults: A U-Shaped Curve.成人全因死亡率与阻塞性睡眠呼吸暂停之间的关联:一条U型曲线。
Ann Am Thorac Soc. 2025 Apr;22(4):581-590. doi: 10.1513/AnnalsATS.202407-755OC.
7
A hybrid framework with large language models for rare disease phenotyping.基于大语言模型的罕见病表型分析混合框架。
BMC Med Inform Decis Mak. 2024 Oct 8;24(1):289. doi: 10.1186/s12911-024-02698-7.
8
Leveraging GPT-4 for identifying cancer phenotypes in electronic health records: a performance comparison between GPT-4, GPT-3.5-turbo, Flan-T5, Llama-3-8B, and spaCy's rule-based and machine learning-based methods.利用GPT-4在电子健康记录中识别癌症表型:GPT-4、GPT-3.5-turbo、Flan-T5、Llama-3-8B与spaCy基于规则和基于机器学习的方法之间的性能比较。
JAMIA Open. 2024 Jul 3;7(3):ooae060. doi: 10.1093/jamiaopen/ooae060. eCollection 2024 Oct.
9
Retrieval-Based Diagnostic Decision Support: Mixed Methods Study.基于检索的诊断决策支持:混合方法研究。
JMIR Med Inform. 2024 Jun 19;12:e50209. doi: 10.2196/50209.
10
Automated HIV Case Identification from the MIMIC-IV Database.从MIMIC-IV数据库中自动识别艾滋病病例
AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:555-564. eCollection 2024.
一项关于流感病例检测系统在两个大型医疗系统之间可转移性的研究。
PLoS One. 2017 Apr 5;12(4):e0174970. doi: 10.1371/journal.pone.0174970. eCollection 2017.
4
Dermatologist-level classification of skin cancer with deep neural networks.基于深度神经网络的皮肤癌皮肤科医生级分类。
Nature. 2017 Feb 2;542(7639):115-118. doi: 10.1038/nature21056. Epub 2017 Jan 25.
5
De-identification of patient notes with recurrent neural networks.使用递归神经网络对患者记录进行去识别化处理。
J Am Med Inform Assoc. 2017 May 1;24(3):596-606. doi: 10.1093/jamia/ocw156.
6
Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.深度学习算法在视网膜眼底照片糖尿病视网膜病变检测中的开发与验证。
JAMA. 2016 Dec 13;316(22):2402-2410. doi: 10.1001/jama.2016.17216.
7
The Promise and Peril of Precision Medicine: Phenotyping Still Matters Most.精准医学的前景与风险:表型分型仍然最为重要。
Mayo Clin Proc. 2016 Oct 8. doi: 10.1016/j.mayocp.2016.08.008.
8
Automated identification of wound information in clinical notes of patients with heart diseases: Developing and validating a natural language processing application.心脏病患者临床记录中伤口信息的自动识别:开发和验证一种自然语言处理应用程序。
Int J Nurs Stud. 2016 Dec;64:25-31. doi: 10.1016/j.ijnurstu.2016.09.013. Epub 2016 Sep 19.
9
Electronic Health Record Based Algorithm to Identify Patients with Autism Spectrum Disorder.基于电子健康记录的自闭症谱系障碍患者识别算法
PLoS One. 2016 Jul 29;11(7):e0159621. doi: 10.1371/journal.pone.0159621. eCollection 2016.
10
Population-Level Prediction of Type 2 Diabetes From Claims Data and Analysis of Risk Factors.从索赔数据预测 2 型糖尿病的人群水平及危险因素分析。
Big Data. 2015 Dec;3(4):277-87. doi: 10.1089/big.2015.0020.