用于从非结构化电子健康记录预测肺癌总生存期的分层嵌入注意力机制

Hierarchical embedding attention for overall survival prediction in lung cancer from unstructured EHRs.

作者信息

Paolo Domenico, Greco Carlo, Cortellini Alessio, Ramella Sara, Soda Paolo, Bria Alessandro, Sicilia Rosa

机构信息

Unit of Computer Systems & Bioinformatics, Department of Engineering, University Campus Bio-Medico di Roma, Roma, Italy.

Research Unit of Radiation Oncology, Department of Medicine and Surgery, University Campus Bio-Medico di Roma, Roma, Italy.

出版信息

BMC Med Inform Decis Mak. 2025 Apr 18;25(1):169. doi: 10.1186/s12911-025-02998-6.

DOI:10.1186/s12911-025-02998-6

PMID:40251623

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12007135/

Abstract

The automated processing of Electronic Health Records (EHRs) poses a significant challenge due to their unstructured nature, rich in valuable, yet disorganized information. Natural Language Processing (NLP), particularly Named Entity Recognition (NER), has been instrumental in extracting structured information from EHR data. However, existing literature primarly focuses on extracting handcrafted clinical features through NLP and NER methods without delving into their learned representations. In this work, we explore the untapped potential of these representations by considering their contextual richness and entity-specific information. Our proposed methodology extracts representations generated by a transformer-based NER model on EHRs data, combines them using a hierarchical attention mechanism, and employs the obtained enriched representation as input for a clinical prediction model. Specifically, this study addresses Overall Survival (OS) in Non-Small Cell Lung Cancer (NSCLC) using unstructured EHRs data collected from an Italian clinical centre encompassing 838 records from 231 lung cancer patients. Whilst our study is applied on EHRs written in Italian, it serves as use case to prove the effectiveness of extracting and employing high level textual representations that capture relevant information as named entities. Our methodology is interpretable because the hierarchical attention mechanism highlights the information in EHRs that the model considers the most crucial during the decision-making process. We validated this interpretability by measuring the agreement of domain experts on the importance assigned by the hierarchical attention mechanism to EHRs information through a questionnaire. Results demonstrate the effectiveness of our method, showcasing statistically significant improvements over traditional manually extracted clinical features.

摘要

电子健康记录（EHRs）的自动化处理面临重大挑战，因为其具有非结构化的性质，包含大量有价值但杂乱无章的信息。自然语言处理（NLP），特别是命名实体识别（NER），在从EHR数据中提取结构化信息方面发挥了重要作用。然而，现有文献主要集中在通过NLP和NER方法提取手工制作的临床特征，而没有深入研究它们的学习表示。在这项工作中，我们通过考虑这些表示的上下文丰富性和实体特定信息来探索其未被挖掘的潜力。我们提出的方法提取基于变压器的NER模型在EHR数据上生成的表示，使用分层注意力机制将它们组合起来，并将获得的丰富表示用作临床预测模型的输入。具体而言，本研究使用从意大利临床中心收集的非结构化EHR数据来解决非小细胞肺癌（NSCLC）的总生存期（OS）问题，该数据包含来自231名肺癌患者的838条记录。虽然我们的研究应用于意大利语书写的EHR，但它作为一个用例来证明提取和使用捕获相关信息作为命名实体的高级文本表示的有效性。我们的方法是可解释的，因为分层注意力机制突出了EHR中模型在决策过程中认为最关键的信息。我们通过问卷调查测量领域专家对分层注意力机制赋予EHR信息的重要性的一致性，从而验证了这种可解释性。结果证明了我们方法的有效性，与传统的手动提取临床特征相比有统计学上的显著改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14a8/12007135/c371f3680c9c/12911_2025_2998_Fig1_HTML.jpg

相似文献

Hierarchical embedding attention for overall survival prediction in lung cancer from unstructured EHRs.

BMC Med Inform Decis Mak. 2025 Apr 18;25(1):169. doi: 10.1186/s12911-025-02998-6.

Exploring Negated Entites for Named Entity Recognition in Italian Lung Cancer Clinical Reports.

Stud Health Technol Inform. 2024 May 23;314:98-102. doi: 10.3233/SHTI240066.

Enhancing suicidal behavior detection in EHRs: A multi-label NLP framework with transformer models and semantic retrieval-based annotation.

J Biomed Inform. 2025 Jan;161:104755. doi: 10.1016/j.jbi.2024.104755. Epub 2024 Dec 2.

Named Entity Recognition in Electronic Health Records: A Methodological Review.

Healthc Inform Res. 2023 Oct;29(4):286-300. doi: 10.4258/hir.2023.29.4.286. Epub 2023 Oct 31.

A Natural Language Processing Model for COVID-19 Detection Based on Dutch General Practice Electronic Health Records by Using Bidirectional Encoder Representations From Transformers: Development and Validation Study.

J Med Internet Res. 2023 Oct 4;25:e49944. doi: 10.2196/49944.

Automated derivation of diagnostic criteria for lung cancer using natural language processing on electronic health records: a pilot study.

BMC Med Inform Decis Mak. 2024 Dec 4;24(1):371. doi: 10.1186/s12911-024-02790-y.

MISTIC: a novel approach for metastasis classification in Italian electronic health records using transformers.

BMC Med Inform Decis Mak. 2025 Apr 10;25(1):160. doi: 10.1186/s12911-025-02994-w.

Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review.

Artif Intell Med. 2023 Dec;146:102701. doi: 10.1016/j.artmed.2023.102701. Epub 2023 Nov 1.

Automatic de-identification of French electronic health records: a cost-effective approach exploiting distant supervision and deep learning models.

BMC Med Inform Decis Mak. 2024 Feb 16;24(1):54. doi: 10.1186/s12911-024-02422-5.

Performance of a Machine Learning Algorithm Using Electronic Health Record Data to Identify and Estimate Survival in a Longitudinal Cohort of Patients With Lung Cancer.

JAMA Netw Open. 2021 Jul 1;4(7):e2114723. doi: 10.1001/jamanetworkopen.2021.14723.

本文引用的文献

A deep learning approach for overall survival prediction in lung cancer with missing values.

Comput Methods Programs Biomed. 2024 Sep;254:108308. doi: 10.1016/j.cmpb.2024.108308. Epub 2024 Jun 28.

Transformers for extracting breast cancer information from Spanish clinical narratives.

Artif Intell Med. 2023 Sep;143:102625. doi: 10.1016/j.artmed.2023.102625. Epub 2023 Jul 13.

Localizing in-domain adaptation of transformer-based biomedical language models.

J Biomed Inform. 2023 Aug;144:104431. doi: 10.1016/j.jbi.2023.104431. Epub 2023 Jun 28.

Clinical named entity recognition and relation extraction using natural language processing of medical free text: A systematic review.

Int J Med Inform. 2023 Sep;177:105122. doi: 10.1016/j.ijmedinf.2023.105122. Epub 2023 Jun 5.

Named entity recognition of Chinese electronic medical records based on a hybrid neural network and medical MC-BERT.

BMC Med Inform Decis Mak. 2022 Dec 1;22(1):315. doi: 10.1186/s12911-022-02059-2.

A Multimodal Ensemble Driven by Multiobjective Optimisation to Predict Overall Survival in Non-Small-Cell Lung Cancer.

J Imaging. 2022 Nov 2;8(11):298. doi: 10.3390/jimaging8110298.

Evaluation of clinical named entity recognition methods for Serbian electronic health records.

Int J Med Inform. 2022 Aug;164:104805. doi: 10.1016/j.ijmedinf.2022.104805. Epub 2022 May 25.

Chinese clinical named entity recognition via multi-head self-attention based BiLSTM-CRF.

Artif Intell Med. 2022 May;127:102282. doi: 10.1016/j.artmed.2022.102282. Epub 2022 Mar 18.

CancerBERT: a cancer domain-specific language model for extracting breast cancer phenotypes from electronic health records.

J Am Med Inform Assoc. 2022 Jun 14;29(7):1208-1216. doi: 10.1093/jamia/ocac040.

A contextual multi-task neural approach to medication and adverse events identification from clinical text.

J Biomed Inform. 2022 Jan;125:103960. doi: 10.1016/j.jbi.2021.103960. Epub 2021 Dec 4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于从非结构化电子健康记录预测肺癌总生存期的分层嵌入注意力机制

Hierarchical embedding attention for overall survival prediction in lung cancer from unstructured EHRs.

作者信息

Paolo Domenico, Greco Carlo, Cortellini Alessio, Ramella Sara, Soda Paolo, Bria Alessandro, Sicilia Rosa

机构信息

Unit of Computer Systems & Bioinformatics, Department of Engineering, University Campus Bio-Medico di Roma, Roma, Italy.

Research Unit of Radiation Oncology, Department of Medicine and Surgery, University Campus Bio-Medico di Roma, Roma, Italy.

出版信息

BMC Med Inform Decis Mak. 2025 Apr 18;25(1):169. doi: 10.1186/s12911-025-02998-6.

DOI:10.1186/s12911-025-02998-6

PMID:40251623

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12007135/

Abstract

摘要

用于从非结构化电子健康记录预测肺癌总生存期的分层嵌入注意力机制

Hierarchical embedding attention for overall survival prediction in lung cancer from unstructured EHRs.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

用于从非结构化电子健康记录预测肺癌总生存期的分层嵌入注意力机制

Hierarchical embedding attention for overall survival prediction in lung cancer from unstructured EHRs.

作者信息

机构信息

出版信息

相似文献

本文引用的文献