基于神经网络序列标注方法的乳腺 X 线摄影筛查报告的全面词级分类。

Comprehensive Word-Level Classification of Screening Mammography Reports Using a Neural Network Sequence Labeling Approach.

机构信息

Department of Radiology, Duke University Medical Center, 2301 Erwin Road, Box 3808, Durham, NC, 27710, USA.

Scanslated, Inc., Durham, NC, USA.

出版信息

J Digit Imaging. 2019 Oct;32(5):685-692. doi: 10.1007/s10278-018-0141-4.

DOI:10.1007/s10278-018-0141-4

PMID:30338478

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6737114/

Abstract

Radiology reports contain a large amount of potentially valuable unstructured data. Recently, neural networks have been employed to perform classification of radiology reports over a few classes at the document level. The success of neural networks in sequence-labeling problems such as named entity recognition and part of speech tagging suggests that they could be used to classify radiology report text with greater granularity. We employed a neural network architecture to comprehensively classify mammography report text at the word level using a sequence labeling approach. Two radiologists devised a comprehensive classification system for screening mammography reports. Each word in each report was manually categorized by a radiologist into one of 33 categories according to the classification system. Tagged words referencing the same finding were grouped into unique sets. We pre-labeled reports with a rule-based algorithm and then manually edited these annotations for 6705 screening mammography reports (25.1%, 66.8%, and 8.1% BI-RADS 0, 1, and 2, respectively). A combined convolutional and recurrent neural network model was used to label words in each sentence of the individual reports. A siamese recurrent neural network was then used to group findings into sets. Performance of the neural network-based method was compared to a rule-based algorithm and a conditional random field (CRF) model. Global accuracy (percentage of documents where all word tags were predicted correctly) and keyword accuracy (percentage of all words that were labeled correctly, excluding words tagged as unimportant) were calculated on an unseen 519 report test set. Two-tailed t tests were used to assess differences between algorithm performance, and p < 0.05 was used to determine statistical significance. The neural network-based approach showed significantly higher global accuracy compared to both the rule-based algorithm (88.3 vs 57.0%, p < 0.001) and the CRF model (88.3% vs. 75.8%, p < 0.001). The neural network also showed significantly higher keyword level accuracy compared to the rule-based algorithm (95.5% vs. 80.9% p < 0.001) and CRF model (95.5% vs. 76.9%, p < 0.001). We demonstrate the potential of neural networks to accurately perform word-level multilabel classification of free text radiology reports across 33 classes, thus showing the utility of a sequence labeling approach to NLP of radiology reports. We found that a neural network classifier outperforms a rule-based algorithm and a CRF classifier for comprehensive multilabel classification of free text screening mammography reports at the word level. By approaching radiology report classification as a sequence-labeling problem, we demonstrate the ability of neural networks to extract data from free text radiology reports at a level of granularity not previously reported.

摘要

放射学报告包含大量潜在有价值的非结构化数据。最近，神经网络已被用于在文档级别对放射学报告进行几类分类。神经网络在命名实体识别和词性标注等序列标记问题上的成功表明，它们可以用于更精细地对放射学报告文本进行分类。我们采用神经网络架构，通过序列标记方法在单词级别上全面分类乳腺 X 光检查报告。两位放射科医生设计了一个全面的分类系统，用于筛查乳腺 X 光检查报告。根据分类系统，每位放射科医生手动将每个报告中的每个单词归类为 33 个类别之一。引用相同发现的标记词被归入唯一的集合。我们使用基于规则的算法对报告进行预标记，然后手动编辑了 6705 份筛查性乳腺 X 光检查报告的这些注释（BI-RADS 0、1 和 2 分别为 25.1%、66.8%和 8.1%）。然后使用卷积和递归神经网络模型对每个报告的句子中的单词进行标记。然后使用孪生递归神经网络将发现分组到集合中。基于神经网络的方法的性能与基于规则的算法和条件随机场（CRF）模型进行了比较。在看不见的 519 份测试报告集上计算了全局准确性（所有文档中所有单词标签都正确预测的百分比）和关键字准确性（所有正确标记的单词的百分比，不包括标记为不重要的单词）。使用双尾 t 检验评估算法性能之间的差异，p<0.05 用于确定统计学意义。与基于规则的算法（88.3%对 57.0%，p<0.001）和 CRF 模型（88.3%对 75.8%，p<0.001）相比，基于神经网络的方法显示出显著更高的全局准确性。与基于规则的算法（95.5%对 80.9%，p<0.001）和 CRF 模型（95.5%对 76.9%，p<0.001）相比，神经网络也显示出显著更高的关键字级别准确性。我们证明了神经网络能够准确地对 33 个类别进行自由文本放射学报告的单词级多标签分类，从而展示了序列标记方法在放射学报告自然语言处理中的实用性。我们发现，与基于规则的算法和 CRF 分类器相比，神经网络分类器在单词级别上对自由文本筛查性乳腺 X 光检查报告进行全面多标签分类的性能更好。通过将放射学报告分类作为序列标记问题，我们展示了神经网络从自由文本放射学报告中提取数据的能力，达到了以前未报告的粒度级别。

相似文献

Comprehensive Word-Level Classification of Screening Mammography Reports Using a Neural Network Sequence Labeling Approach.基于神经网络序列标注方法的乳腺 X 线摄影筛查报告的全面词级分类。

J Digit Imaging. 2019 Oct;32(5):685-692. doi: 10.1007/s10278-018-0141-4.

A New Computer-Aided Diagnosis System with Modified Genetic Feature Selection for BI-RADS Classification of Breast Masses in Mammograms.一种基于改进遗传特征选择的计算机辅助诊断系统，用于乳腺钼靶片中乳腺肿块的 BI-RADS 分类。

Biomed Res Int. 2020 May 11;2020:7695207. doi: 10.1155/2020/7695207. eCollection 2020.

Prediction of Stroke Outcome Using Natural Language Processing-Based Machine Learning of Radiology Report of Brain MRI.使用基于自然语言处理的脑磁共振成像放射学报告机器学习预测卒中结局

J Pers Med. 2020 Dec 16;10(4):286. doi: 10.3390/jpm10040286.

Large Scale Semi-Automated Labeling of Routine Free-Text Clinical Records for Deep Learning.大规模半自动化标注常规自由文本临床记录用于深度学习。

J Digit Imaging. 2019 Feb;32(1):30-37. doi: 10.1007/s10278-018-0105-8.

Automated annotation and classification of BI-RADS assessment from radiology reports.从放射学报告中自动标注和分类乳腺影像报告和数据系统（BI-RADS）评估结果

J Biomed Inform. 2017 May;69:177-187. doi: 10.1016/j.jbi.2017.04.011. Epub 2017 Apr 18.

Adverse Drug Event Detection from Electronic Health Records Using Hierarchical Recurrent Neural Networks with Dual-Level Embedding.基于具有双层嵌入的层次递归神经网络从电子健康记录中检测药物不良反应。

Drug Saf. 2019 Jan;42(1):113-122. doi: 10.1007/s40264-018-0765-9.

Extraction of BI-RADS findings from breast ultrasound reports in Chinese using deep learning approaches.使用深度学习方法从中文乳腺超声报告中提取 BI-RADS 结果。

Int J Med Inform. 2018 Nov;119:17-21. doi: 10.1016/j.ijmedinf.2018.08.009. Epub 2018 Aug 18.

Automated Detection of Measurements and Their Descriptors in Radiology Reports Using a Hybrid Natural Language Processing Algorithm.使用混合自然语言处理算法自动检测放射学报告中的测量值及其描述符。

J Digit Imaging. 2019 Aug;32(4):544-553. doi: 10.1007/s10278-019-00237-9.

Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification.卷积神经网络 (CNN) 和循环神经网络 (RNN) 架构在放射学文本报告分类中的比较效果。

Artif Intell Med. 2019 Jun;97:79-88. doi: 10.1016/j.artmed.2018.11.004. Epub 2018 Nov 23.

Automatic Disease Annotation From Radiology Reports Using Artificial Intelligence Implemented by a Recurrent Neural Network.基于循环神经网络的人工智能自动从放射学报告中进行疾病标注。

AJR Am J Roentgenol. 2019 Apr;212(4):734-740. doi: 10.2214/AJR.18.19869. Epub 2019 Jan 30.

引用本文的文献

Natural Language Processing for Breast Imaging: A Systematic Review.用于乳腺成像的自然语言处理：一项系统综述。

Diagnostics (Basel). 2023 Apr 14;13(8):1420. doi: 10.3390/diagnostics13081420.

Increasing comprehensiveness and reducing workload in a systematic review of complex interventions using automated machine learning.在使用自动化机器学习对复杂干预措施进行系统评价时提高全面性并减少工作量。

Health Technol Assess. 2022 Nov 30. doi: 10.3310/UDIR6682.

Natural Language Processing and Graph Theory: Making Sense of Imaging Records in a Novel Representation Frame.自然语言处理与图论：在一种新型表示框架中理解影像记录

JMIR Med Inform. 2022 Dec 21;10(12):e40534. doi: 10.2196/40534.

Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning.使用深度学习对胸部、腹部和骨盆计算机断层扫描的文本报告进行多标签标注。

BMC Med Inform Decis Mak. 2022 Apr 15;22(1):102. doi: 10.1186/s12911-022-01843-4.

The overview of the deep learning integrated into the medical imaging of liver: a review.深度学习在肝脏医学成像中的应用概述：综述

Hepatol Int. 2021 Aug;15(4):868-880. doi: 10.1007/s12072-021-10229-z. Epub 2021 Jul 15.

A systematic review of natural language processing applied to radiology reports.自然语言处理在放射学报告中的应用的系统评价。

BMC Med Inform Decis Mak. 2021 Jun 3;21(1):179. doi: 10.1186/s12911-021-01533-7.

本文引用的文献

Radiology report annotation using intelligent word embeddings: Applied to multi-institutional chest CT cohort.基于智能词嵌入的放射学报告标注：应用于多机构胸部 CT 队列。

J Biomed Inform. 2018 Jan;77:11-20. doi: 10.1016/j.jbi.2017.11.012. Epub 2017 Nov 23.

Deep Learning to Classify Radiology Free-Text Reports.深度学习在放射科自由文本报告分类中的应用

Radiology. 2018 Mar;286(3):845-852. doi: 10.1148/radiol.2017171115. Epub 2017 Nov 13.

Patient-Centered Radiology Reporting: Using Online Crowdsourcing to Assess the Effectiveness of a Web-Based Interactive Radiology Report.以患者为中心的放射学报告：利用在线众包评估基于网络的交互式放射学报告的有效性。

J Am Coll Radiol. 2017 Nov;14(11):1489-1497. doi: 10.1016/j.jacr.2017.07.027.

Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports.将自然语言处理和机器学习算法集成到放射学报告中的肿瘤反应分类中。

J Digit Imaging. 2018 Apr;31(2):178-184. doi: 10.1007/s10278-017-0027-x.

Characterization of Change and Significance for Clinical Findings in Radiology Reports Through Natural Language Processing.通过自然语言处理对放射学报告中的临床发现进行变化特征描述与意义分析。

J Digit Imaging. 2017 Jun;30(3):314-322. doi: 10.1007/s10278-016-9931-8.

Temporal bone radiology report classification using open source machine learning and natural langue processing libraries.使用开源机器学习和自然语言处理库对颞骨放射学报告进行分类

BMC Med Inform Decis Mak. 2016 Jun 6;16:65. doi: 10.1186/s12911-016-0306-3.

Natural Language Processing in Radiology: A Systematic Review.自然语言处理在放射学中的应用：系统评价。

Radiology. 2016 May;279(2):329-43. doi: 10.1148/radiol.16142770.

Natural Language Processing Technologies in Radiology Research and Clinical Applications.放射学研究与临床应用中的自然语言处理技术

Radiographics. 2016 Jan-Feb;36(1):176-91. doi: 10.1148/rg.2016150080.

The "open letter": radiologists' reports in the era of patient web portals.“公开信”：患者网络门户时代的放射科医生报告

J Am Coll Radiol. 2014 Sep;11(9):863-7. doi: 10.1016/j.jacr.2014.03.014. Epub 2014 May 16.

Radiologic reporting: structure.

AJR Am J Roentgenol. 1983 Jan;140(1):171-2. doi: 10.2214/ajr.140.1.171.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验