用于从癌症病理报告中提取信息的分层注意力网络。

Hierarchical attention networks for information extraction from cancer pathology reports.

作者信息

Gao Shang, Young Michael T, Qiu John X, Yoon Hong-Jun, Christian James B, Fearn Paul A, Tourassi Georgia D, Ramanthan Arvind

机构信息

Computational Science and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.

Surveillance Informatics Branch, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD, USA.

出版信息

J Am Med Inform Assoc. 2018 Mar 1;25(3):321-330. doi: 10.1093/jamia/ocx131.

DOI:10.1093/jamia/ocx131

PMID:29155996

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7282502/

Abstract

OBJECTIVE

We explored how a deep learning (DL) approach based on hierarchical attention networks (HANs) can improve model performance for multiple information extraction tasks from unstructured cancer pathology reports compared to conventional methods that do not sufﬁciently capture syntactic and semantic contexts from free-text documents.

MATERIALS AND METHODS

Data for our analyses were obtained from 942 deidentiﬁed pathology reports collected by the National Cancer Institute Surveillance, Epidemiology, and End Results program. The HAN was implemented for 2 information extraction tasks: (1) primary site, matched to 12 International Classification of Diseases for Oncology topography codes (7 breast, 5 lung primary sites), and (2) histological grade classiﬁcation, matched to G1-G4. Model performance metrics were compared to conventional machine learning (ML) approaches including naive Bayes, logistic regression, support vector machine, random forest, and extreme gradient boosting, and other DL models, including a recurrent neural network (RNN), a recurrent neural network with attention (RNN w/A), and a convolutional neural network.

RESULTS

Our results demonstrate that for both information tasks, HAN performed signiﬁcantly better compared to the conventional ML and DL techniques. In particular, across the 2 tasks, the mean micro and macro F-scores for the HAN with pretraining were (0.852,0.708), compared to naive Bayes (0.518, 0.213), logistic regression (0.682, 0.453), support vector machine (0.634, 0.434), random forest (0.698, 0.508), extreme gradient boosting (0.696, 0.522), RNN (0.505, 0.301), RNN w/A (0.637, 0.471), and convolutional neural network (0.714, 0.460).

CONCLUSIONS

HAN-based DL models show promise in information abstraction tasks within unstructured clinical pathology reports.

摘要

目的

我们探讨了基于分层注意力网络（HAN）的深度学习（DL）方法与传统方法相比，如何提高从非结构化癌症病理报告中进行多信息提取任务的模型性能，传统方法无法充分捕捉自由文本文件中的句法和语义上下文。

材料与方法

我们分析的数据来自美国国家癌症研究所监测、流行病学和最终结果计划收集的942份去标识化病理报告。HAN被用于两项信息提取任务：（1）主要部位，与12个国际肿瘤疾病分类地形代码相匹配（7个乳腺、5个肺主要部位），以及（2）组织学分级分类，与G1 - G4相匹配。将模型性能指标与传统机器学习（ML）方法进行比较，包括朴素贝叶斯、逻辑回归、支持向量机、随机森林和极端梯度提升，以及其他DL模型，包括循环神经网络（RNN）、带注意力的循环神经网络（RNN w/A）和卷积神经网络。

结果

我们的结果表明，对于这两项信息任务，HAN的表现明显优于传统的ML和DL技术。特别是，在这两项任务中，经过预训练的HAN的平均微观和宏观F分数分别为（0.852,0.708），相比之下，朴素贝叶斯为（0.518, 0.213），逻辑回归为（0.682, 0.453），支持向量机为（0.634, 0.434），随机森林为（0.698, 0.508），极端梯度提升为（0.696, 0.522），RNN为（0.505, 0.301），RNN w/A为（0.637, 0.471），卷积神经网络为（0.714, 0.460）。

结论

基于HAN的DL模型在非结构化临床病理报告中的信息抽象任务中显示出前景。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a818/7282502/5dbfb6e91240/ocx131f1.jpg

相似文献

Hierarchical attention networks for information extraction from cancer pathology reports.

J Am Med Inform Assoc. 2018 Mar 1;25(3):321-330. doi: 10.1093/jamia/ocx131.

Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.

J Am Med Inform Assoc. 2020 Jan 1;27(1):89-98. doi: 10.1093/jamia/ocz153.

Classifying cancer pathology reports with hierarchical self-attention networks.

Artif Intell Med. 2019 Nov;101:101726. doi: 10.1016/j.artmed.2019.101726. Epub 2019 Oct 15.

Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification.

Artif Intell Med. 2019 Jun;97:79-88. doi: 10.1016/j.artmed.2018.11.004. Epub 2018 Nov 23.

Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports.

IEEE J Biomed Health Inform. 2018 Jan;22(1):244-251. doi: 10.1109/JBHI.2017.2700722. Epub 2017 May 3.

Hierarchical Recurrent Neural Hashing for Image Retrieval With Hierarchical Convolutional Features.

IEEE Trans Image Process. 2018;27(1):106-120. doi: 10.1109/TIP.2017.2755766.

Natural language processing and recurrent network models for identifying genomic mutation-associated cancer treatment change from patient progress notes.

JAMIA Open. 2019 Apr;2(1):139-149. doi: 10.1093/jamiaopen/ooy061. Epub 2019 Jan 3.

Accelerated training of bootstrap aggregation-based deep information extraction systems from cancer pathology reports.

J Biomed Inform. 2020 Oct;110:103564. doi: 10.1016/j.jbi.2020.103564. Epub 2020 Sep 9.

Temporal indexing of medical entity in Chinese clinical notes.

BMC Med Inform Decis Mak. 2019 Jan 31;19(Suppl 1):17. doi: 10.1186/s12911-019-0735-x.

Decoding of finger trajectory from ECoG using deep learning.

J Neural Eng. 2018 Jun;15(3):036009. doi: 10.1088/1741-2552/aa9dbe. Epub 2017 Nov 28.

引用本文的文献

A survey of NLP methods for oncology in the past decade with a focus on cancer registry applications.

Artif Intell Rev. 2025;58(10):314. doi: 10.1007/s10462-025-11316-5. Epub 2025 Jul 16.

Open-Source Hybrid Large Language Model Integrated System for Extraction of Breast Cancer Treatment Pathway From Free-Text Clinical Notes.

JCO Clin Cancer Inform. 2025 Jun;9:e2500002. doi: 10.1200/CCI-25-00002. Epub 2025 Jun 27.

Replicating Current Procedural Terminology code assignment of rhinology operative notes using machine learning.

World J Otorhinolaryngol Head Neck Surg. 2024 May 28;11(2):198-206. doi: 10.1002/wjo2.188. eCollection 2025 Jun.

Developing and Validating an Automatic Support System for Tumor Coding in Pathology Reports in Spanish.

JCO Clin Cancer Inform. 2025 Feb;9:e2400124. doi: 10.1200/CCI.24.00124. Epub 2025 Feb 24.

Development of message passing-based graph convolutional networks for classifying cancer pathology reports.

BMC Med Inform Decis Mak. 2024 Sep 17;24(Suppl 5):262. doi: 10.1186/s12911-024-02662-5.

Investigating quantitative histological characteristics in renal pathology using HistoLens.

Sci Rep. 2024 Jul 30;14(1):17528. doi: 10.1038/s41598-024-68406-7.

Natural Language Processing for Clinical Laboratory Data Repository Systems: Implementation and Evaluation for Respiratory Viruses.

JMIR AI. 2023 Jun 6;2:e44835. doi: 10.2196/44835.

Integrating predictive coding and a user-centric interface for enhanced auditing and quality in cancer registry data.

Comput Struct Biotechnol J. 2024 Apr 7;24:322-333. doi: 10.1016/j.csbj.2024.04.007. eCollection 2024 Dec.

Systematic evaluation of common natural language processing techniques to codify clinical notes.

PLoS One. 2024 Mar 7;19(3):e0298892. doi: 10.1371/journal.pone.0298892. eCollection 2024.

Classification of neurologic outcomes from medical notes using natural language processing.

Expert Syst Appl. 2023 Mar 15;214. doi: 10.1016/j.eswa.2022.119171. Epub 2022 Nov 6.

本文引用的文献

Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports.

IEEE J Biomed Health Inform. 2018 Jan;22(1):244-251. doi: 10.1109/JBHI.2017.2700722. Epub 2017 May 3.

Bidirectional RNN for Medical Event Detection in Electronic Health Records.

Proc Conf. 2016 Jun;2016:473-482. doi: 10.18653/v1/n16-1056.

LSTM: A Search Space Odyssey.

IEEE Trans Neural Netw Learn Syst. 2017 Oct;28(10):2222-2232. doi: 10.1109/TNNLS.2016.2582924. Epub 2016 Jul 8.

Semi-supervised Convolutional Neural Networks for Text Categorization via Region Embedding.

Adv Neural Inf Process Syst. 2015 Dec;28:919-927.

Aiming High--Changing the Trajectory for Cancer.

N Engl J Med. 2016 May 19;374(20):1901-4. doi: 10.1056/NEJMp1600894. Epub 2016 Apr 4.

Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence.

Am J Epidemiol. 2014 Mar 15;179(6):749-58. doi: 10.1093/aje/kwt441. Epub 2014 Jan 30.

Automated classification of free-text pathology reports for registration of incident cases of cancer.

Methods Inf Med. 2012;51(3):242-51. doi: 10.3414/ME11-01-0005. Epub 2011 Jul 26.

Clinicians are from Mars and pathologists are from Venus.

Arch Pathol Lab Med. 2000 Jul;124(7):1040-6. doi: 10.5858/2000-124-1040-CAFMAP.

Long short-term memory.

Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于从癌症病理报告中提取信息的分层注意力网络。

Hierarchical attention networks for information extraction from cancer pathology reports.

作者信息

Gao Shang, Young Michael T, Qiu John X, Yoon Hong-Jun, Christian James B, Fearn Paul A, Tourassi Georgia D, Ramanthan Arvind

机构信息

Computational Science and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.

Surveillance Informatics Branch, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD, USA.