Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, 6229 ET, Maastricht, The Netherlands.
Department of Nuclear Medicine and Molecular Imaging, Tata Memorial Hospital, Mumbai, India.
J Digit Imaging. 2023 Jun;36(3):812-826. doi: 10.1007/s10278-023-00787-z. Epub 2023 Feb 14.
Rising incidence and mortality of cancer have led to an incremental amount of research in the field. To learn from preexisting data, it has become important to capture maximum information related to disease type, stage, treatment, and outcomes. Medical imaging reports are rich in this kind of information but are only present as free text. The extraction of information from such unstructured text reports is labor-intensive. The use of Natural Language Processing (NLP) tools to extract information from radiology reports can make it less time-consuming as well as more effective. In this study, we have developed and compared different models for the classification of lung carcinoma reports using clinical concepts. This study was approved by the institutional ethics committee as a retrospective study with a waiver of informed consent. A clinical concept-based classification pipeline for lung carcinoma radiology reports was developed using rule-based as well as machine learning models and compared. The machine learning models used were XGBoost and two more deep learning model architectures with bidirectional long short-term neural networks. A corpus consisting of 1700 radiology reports including computed tomography (CT) and positron emission tomography/computed tomography (PET/CT) reports were used for development and testing. Five hundred one radiology reports from MIMIC-III Clinical Database version 1.4 was used for external validation. The pipeline achieved an overall F1 score of 0.94 on the internal set and 0.74 on external validation with the rule-based algorithm using expert input giving the best performance. Among the machine learning models, the Bi-LSTM_dropout model performed better than the ML model using XGBoost and the Bi-LSTM_simple model on internal set, whereas on external validation, the Bi-LSTM_simple model performed relatively better than other 2. This pipeline can be used for clinical concept-based classification of radiology reports related to lung carcinoma from a huge corpus and also for automated annotation of these reports.
癌症发病率和死亡率的上升导致该领域的研究不断增加。为了从现有数据中学习,获取与疾病类型、阶段、治疗和结果相关的最大信息变得尤为重要。医学影像报告在这方面信息丰富,但仅以自由文本的形式呈现。从这种非结构化的文本报告中提取信息是一项劳动密集型工作。使用自然语言处理(NLP)工具从放射学报告中提取信息可以使其更省时、更高效。在这项研究中,我们使用临床概念开发并比较了不同的肺癌报告分类模型。本研究经机构伦理委员会批准,为回顾性研究,豁免知情同意。使用基于规则和机器学习的模型开发了基于临床概念的肺癌放射学报告分类管道,并进行了比较。使用的机器学习模型包括 XGBoost 以及两个具有双向长短时记忆神经网络的深度学习模型架构。该模型使用了一个包含 1700 份放射学报告的语料库,包括计算机断层扫描(CT)和正电子发射断层扫描/计算机断层扫描(PET/CT)报告,用于开发和测试。使用 MIMIC-III 临床数据库版本 1.4 的 501 份放射学报告进行外部验证。该管道在内部集上的整体 F1 得分为 0.94,在外部验证中使用专家输入的基于规则的算法的性能最佳,得分为 0.74。在机器学习模型中,Bi-LSTM_dropout 模型在内部集上的表现优于使用 XGBoost 的 ML 模型和 Bi-LSTM_simple 模型,而在外部验证中,Bi-LSTM_simple 模型的表现优于其他两个模型。该管道可用于基于临床概念的肺癌相关放射学报告的分类,也可用于这些报告的自动注释。