Sugimoto Kento, Wada Shoya, Konishi Shozo, Okada Katsuki, Manabe Shirou, Matsumura Yasushi, Takeda Toshihiro
Department of Medical Informatics, Graduate School of Medicine, Osaka University, Suita, Osaka, Japan.
Department of Transformative System for Medical Information, Graduate School of Medicine, Osaka University, Suita, Osaka, Japan.
JMIR Med Inform. 2023 Nov 14;11:e49041. doi: 10.2196/49041.
Radiology reports are usually written in a free-text format, which makes it challenging to reuse the reports.
For secondary use, we developed a 2-stage deep learning system for extracting clinical information and converting it into a structured format.
Our system mainly consists of 2 deep learning modules: entity extraction and relation extraction. For each module, state-of-the-art deep learning models were applied. We trained and evaluated the models using 1040 in-house Japanese computed tomography (CT) reports annotated by medical experts. We also evaluated the performance of the entire pipeline of our system. In addition, the ratio of annotated entities in the reports was measured to validate the coverage of the clinical information with our information model.
The microaveraged F1-scores of our best-performing model for entity extraction and relation extraction were 96.1% and 97.4%, respectively. The microaveraged F1-score of the 2-stage system, which is a measure of the performance of the entire pipeline of our system, was 91.9%. Our system showed encouraging results for the conversion of free-text radiology reports into a structured format. The coverage of clinical information in the reports was 96.2% (6595/6853).
Our 2-stage deep system can extract clinical information from chest and abdomen CT reports accurately and comprehensively.
放射学报告通常采用自由文本格式撰写,这使得报告的再利用具有挑战性。
为了二次使用,我们开发了一个两阶段深度学习系统,用于提取临床信息并将其转换为结构化格式。
我们的系统主要由两个深度学习模块组成:实体提取和关系提取。对于每个模块,应用了最先进的深度学习模型。我们使用由医学专家注释的1040份内部日本计算机断层扫描(CT)报告对模型进行训练和评估。我们还评估了系统整个流程的性能。此外,测量报告中注释实体的比例,以验证我们的信息模型对临床信息的覆盖范围。
我们表现最佳的实体提取模型和关系提取模型的微平均F1分数分别为96.1%和97.4%。两阶段系统的微平均F1分数(衡量我们系统整个流程性能的指标)为91.9%。我们的系统在将自由文本放射学报告转换为结构化格式方面显示出令人鼓舞的结果。报告中临床信息的覆盖范围为96.2%(6595/6853)。
我们的两阶段深度系统可以准确、全面地从胸部和腹部CT报告中提取临床信息。