Suppr超能文献

马来西亚大学医学中心乳腺影像学叙述性报告中的自然语言处理

Natural language processing in narrative breast radiology reporting in University Malaya Medical Centre.

作者信息

Tan Wee Ming, Ng Wei Lin, Ganggayah Mogana Darshini, Hoe Victor Chee Wai, Rahmat Kartini, Zaini Hana Salwani, Mohd Taib Nur Aishah, Dhillon Sarinder Kaur

机构信息

Data Science and Bioinformatics Laboratory, Institute of Biological Sciences, Faculty of Science, Universiti Malaya, Kuala Lumpur, Malaysia.

Department of Biomedical Imaging, Faculty of Medicine, Universiti Malaya, Kuala Lumpur, Malaysia.

出版信息

Health Informatics J. 2023 Jul-Sep;29(3):14604582231203763. doi: 10.1177/14604582231203763.

Abstract

Radiology reporting is narrative, and its content depends on the clinician's ability to interpret the images accurately. A tertiary hospital, such as anonymous institute, focuses on writing reports narratively as part of training for medical personnel. Nevertheless, free-text reports make it inconvenient to extract information for clinical audits and data mining. Therefore, we aim to convert unstructured breast radiology reports into structured formats using natural language processing (NLP) algorithm. This study used 327 de-identified breast radiology reports from the anonymous institute. The radiologist identified the significant data elements to be extracted. Our NLP algorithm achieved 97% and 94.9% accuracy in training and testing data, respectively. Henceforth, the structured information was used to build the predictive model for predicting the value of the BIRADS category. The model based on random forest generated the highest accuracy of 92%. Our study not only fulfilled the demands of clinicians by enhancing communication between medical personnel, but it also demonstrated the usefulness of mineable structured data in yielding significant insights.

摘要

放射学报告是叙述性的,其内容取决于临床医生准确解读图像的能力。一家三级医院,如匿名机构,将重点放在以叙述方式撰写报告上,作为对医务人员培训的一部分。然而,自由文本报告使得提取信息用于临床审计和数据挖掘变得不方便。因此,我们旨在使用自然语言处理(NLP)算法将非结构化的乳腺放射学报告转换为结构化格式。本研究使用了来自匿名机构的327份去识别化的乳腺放射学报告。放射科医生确定了要提取的重要数据元素。我们的NLP算法在训练数据和测试数据中的准确率分别达到了97%和94.9%。此后,结构化信息被用于构建预测模型,以预测乳腺影像报告和数据系统(BIRADS)分类的价值。基于随机森林的模型产生了最高92%的准确率。我们的研究不仅通过加强医务人员之间的沟通满足了临床医生的需求,还证明了可挖掘的结构化数据在产生重要见解方面的有用性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验