Suppr超能文献

基于深度学习的结构化放射学报告自然语言处理对肿瘤学结果的评估

Deep Learning-based Assessment of Oncologic Outcomes from Natural Language Processing of Structured Radiology Reports.

作者信息

Fink Matthias A, Kades Klaus, Bischoff Arved, Moll Martin, Schnell Merle, Küchler Maike, Köhler Gregor, Sellner Jan, Heussel Claus Peter, Kauczor Hans-Ulrich, Schlemmer Heinz-Peter, Maier-Hein Klaus, Weber Tim F, Kleesiek Jens

机构信息

Clinic for Diagnostic and Interventional Radiology (M.A.F., A.B., M.M., M.S., M.K., C.P.H., H.U.K., T.F.W.) and Pattern Analysis and Learning Group, Department of Radiation Oncology (K.M.H.), Heidelberg University Hospital, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany; Translational Lung Research Center Heidelberg (TLRC), Member of the German Center for Lung Research (DZL), Heidelberg, Germany (M.A.F., A.B., M.M., M.S., M.K., C.P.H., H.U.K., T.F.W.); Faculty of Mathematics and Computer Science (K.K.) and Department of Diagnostic and Interventional Radiology with Nuclear Medicine, Heidelberg Thoracic Clinic (C.P.H.), Heidelberg University, Heidelberg, Germany; Division of Medical Image Computing (K.K., G.K., K.M.H.), Department of Computer Assisted Medical Interventions (CAMI) (J.S.), and Department of Radiology (H.P.S.), German Cancer Research Center (DKFZ), Heidelberg, Germany; German Cancer Consortium (DKTK), Partner Sites Essen and Heidelberg, Heidelberg, Germany (H.P.S., K.M.H., J.K.); and Institute for Artificial Intelligence in Medicine (IKIM), University Medicine Essen, Essen, Germany (J.K.).

出版信息

Radiol Artif Intell. 2022 Jul 20;4(5):e220055. doi: 10.1148/ryai.220055. eCollection 2022 Sep.

Abstract

PURPOSE

To train a deep natural language processing (NLP) model, using data mined structured oncology reports (SOR), for rapid tumor response category (TRC) classification from free-text oncology reports (FTOR) and to compare its performance with human readers and conventional NLP algorithms.

MATERIALS AND METHODS

In this retrospective study, databases of three independent radiology departments were queried for SOR and FTOR dated from March 2018 to August 2021. An automated data mining and curation pipeline was developed to extract Response Evaluation Criteria in Solid Tumors-related TRCs for SOR for ground truth definition. The deep NLP bidirectional encoder representations from transformers (BERT) model and three feature-rich algorithms were trained on SOR to predict TRCs in FTOR. Models' F1 scores were compared against scores of radiologists, medical students, and radiology technologist students. Lexical and semantic analyses were conducted to investigate human and model performance on FTOR.

RESULTS

Oncologic findings and TRCs were accurately mined from 9653 of 12 833 (75.2%) queried SOR, yielding oncology reports from 10 455 patients (mean age, 60 years ± 14 [SD]; 5303 women) who met inclusion criteria. On 802 FTOR in the test set, BERT achieved better TRC classification results (F1, 0.70; 95% CI: 0.68, 0.73) than the best-performing reference linear support vector classifier (F1, 0.63; 95% CI: 0.61, 0.66) and technologist students (F1, 0.65; 95% CI: 0.63, 0.67), had similar performance to medical students (F1, 0.73; 95% CI: 0.72, 0.75), but was inferior to radiologists (F1, 0.79; 95% CI: 0.78, 0.81). Lexical complexity and semantic ambiguities in FTOR influenced human and model performance, revealing maximum F1 score drops of -0.17 and -0.19, respectively.

CONCLUSION

The developed deep NLP model reached the performance level of medical students but not radiologists in curating oncologic outcomes from radiology FTOR. Neural Networks, Computer Applications-Detection/Diagnosis, Oncology, Research Design, Staging, Tumor Response, Comparative Studies, Decision Analysis, Experimental Investigations, Observer Performance, Outcomes Analysis © RSNA, 2022.

摘要

目的

使用从结构化肿瘤学报告(SOR)中挖掘的数据训练一个深度自然语言处理(NLP)模型,用于从自由文本肿瘤学报告(FTOR)中快速进行肿瘤反应类别(TRC)分类,并将其性能与人类读者和传统NLP算法进行比较。

材料与方法

在这项回顾性研究中,查询了三个独立放射科的数据库,获取2018年3月至2021年8月期间的SOR和FTOR。开发了一个自动化数据挖掘和整理管道,以提取SOR中与实体瘤相关的TRC的反应评估标准,用于定义地面真值。在SOR上训练深度NLP双向编码器表征来自变压器(BERT)模型和三种特征丰富的算法,以预测FTOR中的TRC。将模型的F1分数与放射科医生、医学生和放射技术专业学生的分数进行比较。进行了词汇和语义分析,以研究人类和模型在FTOR上的表现。

结果

从12833份查询的SOR中的9653份(75.2%)中准确挖掘出肿瘤学发现和TRC,得到了10455名符合纳入标准患者(平均年龄60岁±14[标准差];5303名女性)的肿瘤学报告。在测试集中的802份FTOR上,BERT实现了比表现最佳的参考线性支持向量分类器(F1,0.63;95%CI:0.61,0.66)和技术专业学生(F1,0.65;95%CI:0.63,0.67)更好的TRC分类结果(F1,0.70;95%CI:0.68,0.73),与医学生(F1,0.73;95%CI:0.72,0.75)表现相似,但不如放射科医生(F1,0.79;95%CI:0.78,0.81)。FTOR中的词汇复杂性和语义模糊性影响了人类和模型的表现,分别显示F1分数最大下降-0.17和-0.19。

结论

所开发的深度NLP模型在从放射学FTOR中整理肿瘤学结果方面达到了医学生的性能水平,但未达到放射科医生的水平。神经网络、计算机应用-检测/诊断、肿瘤学、研究设计、分期、肿瘤反应、比较研究、决策分析、实验研究、观察者表现、结果分析 ©RSNA,2022。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e688/9530771/2960abd1666d/ryai.220055.VA.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验