• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从电子健康记录中进行患者表型分析的深度学习与标记选择相结合。

Combining deep learning with token selection for patient phenotyping from electronic health records.

机构信息

Predictive Society and Data Analytics Lab, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland.

Steyr School of Management, University of Applied Sciences Upper Austria, 4400, Steyr Campus, Austria.

出版信息

Sci Rep. 2020 Jan 29;10(1):1432. doi: 10.1038/s41598-020-58178-1.

DOI:10.1038/s41598-020-58178-1
PMID:31996705
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6989657/
Abstract

Artificial intelligence provides the opportunity to reveal important information buried in large amounts of complex data. Electronic health records (eHRs) are a source of such big data that provide a multitude of health related clinical information about patients. However, text data from eHRs, e.g., discharge summary notes, are challenging in their analysis because these notes are free-form texts and the writing formats and styles vary considerably between different records. For this reason, in this paper we study deep learning neural networks in combination with natural language processing to analyze text data from clinical discharge summaries. We provide a detail analysis of patient phenotyping, i.e., the automatic prediction of ten patient disorders, by investigating the influence of network architectures, sample sizes and information content of tokens. Importantly, for patients suffering from Chronic Pain, the disorder that is the most difficult one to classify, we find the largest performance gain for a combined word- and sentence-level input convolutional neural network (ws-CNN). As a general result, we find that the combination of data quality and data quantity of the text data is playing a crucial role for using more complex network architectures that improve significantly beyond a word-level input CNN model. From our investigations of learning curves and token selection mechanisms, we conclude that for such a transition one requires larger sample sizes because the amount of information per sample is quite small and only carried by few tokens and token categories. Interestingly, we found that the token frequency in the eHRs follow a Zipf law and we utilized this behavior to investigate the information content of tokens by defining a token selection mechanism. The latter addresses also issues of explainable AI.

摘要

人工智能提供了揭示隐藏在大量复杂数据中的重要信息的机会。电子健康记录(EHR)是此类大数据的来源,提供了大量与患者相关的临床信息。然而,EHR 中的文本数据,例如出院小结,在分析时具有挑战性,因为这些记录是自由格式的文本,并且不同记录之间的书写格式和风格差异很大。出于这个原因,在本文中,我们研究了深度学习神经网络与自然语言处理的结合,以分析来自临床出院小结的文本数据。我们通过研究网络架构、样本大小和标记信息含量对患者表型分析(即自动预测十种患者疾病)进行了详细分析。重要的是,对于患有慢性疼痛的患者,这种疾病是最难分类的,我们发现对于组合的词和句子级输入卷积神经网络(ws-CNN),可以获得最大的性能提升。一般来说,我们发现文本数据的质量和数量的结合对于使用更复杂的网络架构至关重要,这些架构的性能显著优于基于词级输入的 CNN 模型。从我们对学习曲线和标记选择机制的研究中,我们得出结论,对于这种转变,需要更大的样本量,因为每个样本的信息量很小,只能通过少数标记和标记类别来承载。有趣的是,我们发现 EHR 中的标记频率遵循 Zipf 定律,我们利用这种行为通过定义标记选择机制来研究标记的信息含量。后者还解决了人工智能可解释性的问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc8/6989657/765db3042ca2/41598_2020_58178_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc8/6989657/f9aa1f86a398/41598_2020_58178_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc8/6989657/570a8f879678/41598_2020_58178_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc8/6989657/75de346ea574/41598_2020_58178_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc8/6989657/da49d2b90cdd/41598_2020_58178_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc8/6989657/1414b48e2730/41598_2020_58178_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc8/6989657/a9d4bfca9a9a/41598_2020_58178_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc8/6989657/c4528aec5dba/41598_2020_58178_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc8/6989657/fc2b9f43deb6/41598_2020_58178_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc8/6989657/765db3042ca2/41598_2020_58178_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc8/6989657/f9aa1f86a398/41598_2020_58178_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc8/6989657/570a8f879678/41598_2020_58178_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc8/6989657/75de346ea574/41598_2020_58178_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc8/6989657/da49d2b90cdd/41598_2020_58178_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc8/6989657/1414b48e2730/41598_2020_58178_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc8/6989657/a9d4bfca9a9a/41598_2020_58178_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc8/6989657/c4528aec5dba/41598_2020_58178_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc8/6989657/fc2b9f43deb6/41598_2020_58178_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fc8/6989657/765db3042ca2/41598_2020_58178_Fig9_HTML.jpg

相似文献

1
Combining deep learning with token selection for patient phenotyping from electronic health records.从电子健康记录中进行患者表型分析的深度学习与标记选择相结合。
Sci Rep. 2020 Jan 29;10(1):1432. doi: 10.1038/s41598-020-58178-1.
2
Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.人工智能通过外部资源学习语义以对出院小结中的诊断代码进行分类。
J Med Internet Res. 2017 Nov 6;19(11):e380. doi: 10.2196/jmir.8344.
3
Identification of patients' smoking status using an explainable AI approach: a Danish electronic health records case study.利用可解释 AI 方法识别患者的吸烟状况:丹麦电子健康记录案例研究。
BMC Med Res Methodol. 2024 May 17;24(1):114. doi: 10.1186/s12874-024-02231-4.
4
Adverse Drug Event Detection from Electronic Health Records Using Hierarchical Recurrent Neural Networks with Dual-Level Embedding.基于具有双层嵌入的层次递归神经网络从电子健康记录中检测药物不良反应。
Drug Saf. 2019 Jan;42(1):113-122. doi: 10.1007/s40264-018-0765-9.
5
Combining structured and unstructured data for predictive models: a deep learning approach.将结构化和非结构化数据结合用于预测模型:一种深度学习方法。
BMC Med Inform Decis Mak. 2020 Oct 29;20(1):280. doi: 10.1186/s12911-020-01297-6.
6
Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.利用基于深度学习的自然语言处理技术从非结构化电子健康记录中分类社会健康决定因素。
J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.
7
EHR-HGCN: An Enhanced Hybrid Approach for Text Classification Using Heterogeneous Graph Convolutional Networks in Electronic Health Records.EHR-HGCN:一种在电子健康记录中使用异构图卷积网络的增强型混合文本分类方法。
IEEE J Biomed Health Inform. 2024 Mar;28(3):1668-1679. doi: 10.1109/JBHI.2023.3346210. Epub 2024 Mar 6.
8
Intelligent diagnosis with Chinese electronic medical records based on convolutional neural networks.基于卷积神经网络的中文电子病历智能诊断。
BMC Bioinformatics. 2019 Feb 1;20(1):62. doi: 10.1186/s12859-019-2617-8.
9
A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.深度学习模型在不同类别不平衡程度的非结构化医疗记录文本分类中的对比研究。
BMC Med Res Methodol. 2022 Jul 2;22(1):181. doi: 10.1186/s12874-022-01665-y.
10
Natural language processing with deep learning for medical adverse event detection from free-text medical narratives: A case study of detecting total hip replacement dislocation.基于深度学习的自然语言处理在从自由文本医疗叙事中检测医疗不良事件中的应用:以检测全髋关节置换脱位为例。
Comput Biol Med. 2021 Feb;129:104140. doi: 10.1016/j.compbiomed.2020.104140. Epub 2020 Nov 24.

引用本文的文献

1
Role of artificial intelligence in revolutionizing drug discovery.人工智能在变革药物研发中的作用。
Fundam Res. 2024 May 9;5(3):1273-1287. doi: 10.1016/j.fmre.2024.04.021. eCollection 2025 May.
2
Towards automated phenotype definition extraction using large language models.迈向使用大语言模型进行自动化表型定义提取
Genomics Inform. 2024 Oct 31;22(1):21. doi: 10.1186/s44342-024-00023-2.
3
Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review.

本文引用的文献

1
Comparing biological information contained in mRNA and non-coding RNAs for classification of lung cancer patients.比较 mRNA 和非编码 RNA 中包含的生物学信息,以对肺癌患者进行分类。
BMC Cancer. 2019 Dec 3;19(1):1176. doi: 10.1186/s12885-019-6338-1.
2
A comprehensive survey of error measures for evaluating binary decision making in data science.对数据科学中用于评估二元决策的误差度量的全面综述。
Wiley Interdiscip Rev Data Min Knowl Discov. 2019 Sep-Oct;9(5):e1303. doi: 10.1002/widm.1303. Epub 2019 Feb 8.
3
Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review.
使用机器学习方法进行自然语言处理,以分析来自电子健康记录的非结构化患者报告结局:系统评价。
Artif Intell Med. 2023 Dec;146:102701. doi: 10.1016/j.artmed.2023.102701. Epub 2023 Nov 1.
4
Improving Diagnostics with Deep Forest Applied to Electronic Health Records.深度学习森林在电子健康记录中的应用提高诊断能力。
Sensors (Basel). 2023 Jul 21;23(14):6571. doi: 10.3390/s23146571.
5
Data Challenges for Externally Controlled Trials: Viewpoint.外部对照试验的数据挑战:观点
J Med Internet Res. 2023 Apr 5;25:e43484. doi: 10.2196/43484.
6
How data science and AI-based technologies impact genomics.数据科学和基于人工智能的技术如何影响基因组学。
Singapore Med J. 2023 Jan;64(1):59-66. doi: 10.4103/singaporemedj.SMJ-2021-438.
7
Practical Considerations for Developing Clinical Natural Language Processing Systems for Population Health Management and Measurement.开发用于人群健康管理与测量的临床自然语言处理系统的实际考量
JMIR Med Inform. 2023 Jan 3;11:e37805. doi: 10.2196/37805.
8
A semi-supervised adaptive Markov Gaussian embedding process (SAMGEP) for prediction of phenotype event times using the electronic health record.基于电子健康记录的表型事件时间预测的半监督自适应马尔可夫高斯嵌入过程 (SAMGEP)。
Sci Rep. 2022 Oct 22;12(1):17737. doi: 10.1038/s41598-022-22585-3.
9
Artificial Intelligence in Rheumatoid Arthritis: Current Status and Future Perspectives: A State-of-the-Art Review.类风湿关节炎中的人工智能:现状与未来展望:一篇最新综述
Rheumatol Ther. 2022 Oct;9(5):1249-1304. doi: 10.1007/s40744-022-00475-4. Epub 2022 Jul 18.
10
Preparing for the next pandemic via transfer learning from existing diseases with hierarchical multi-modal BERT: a study on COVID-19 outcome prediction.通过使用层次化多模态 BERT 从现有疾病进行迁移学习,为下一次大流行做准备:一项关于 COVID-19 结果预测的研究。
Sci Rep. 2022 Jun 24;12(1):10748. doi: 10.1038/s41598-022-13072-w.
慢性病临床记录的自然语言处理:系统综述
JMIR Med Inform. 2019 Apr 27;7(2):e12239. doi: 10.2196/12239.
4
Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives.比较基于深度学习和概念提取的方法用于从临床叙述中进行患者表型分析。
PLoS One. 2018 Feb 15;13(2):e0192360. doi: 10.1371/journal.pone.0192360. eCollection 2018.
5
Editorial: Artificial Neural Networks as Models of Neural Information Processing.社论:作为神经信息处理模型的人工神经网络
Front Comput Neurosci. 2017 Dec 19;11:114. doi: 10.3389/fncom.2017.00114. eCollection 2017.
6
A survey on deep learning in medical image analysis.深度学习在医学图像分析中的应用研究综述。
Med Image Anal. 2017 Dec;42:60-88. doi: 10.1016/j.media.2017.07.005. Epub 2017 Jul 26.
7
Applying deep neural networks to unstructured text notes in electronic medical records for phenotyping youth depression.应用深度神经网络对电子病历中的非结构化文本记录进行青年抑郁表型分析。
Evid Based Ment Health. 2017 Aug;20(3):83-87. doi: 10.1136/eb-2017-102688. Epub 2017 Jul 24.
8
Building the biomedical data science workforce.构建生物医学数据科学人才队伍。
PLoS Biol. 2017 Jul 17;15(7):e2003082. doi: 10.1371/journal.pbio.2003082. eCollection 2017 Jul.
9
MIMIC-III, a freely accessible critical care database.MIMIC-III,一个免费获取的重症监护数据库。
Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.
10
Quality of EHR data extractions for studies of preterm birth in a tertiary care center: guidelines for obtaining reliable data.三级医疗中心早产研究中电子健康记录(EHR)数据提取的质量:获取可靠数据的指南
BMC Pediatr. 2016 Apr 29;16:59. doi: 10.1186/s12887-016-0592-z.