Suppr超能文献

荷兰语自由文本放射学报告中的自然语言处理:小语种地区肺部肿瘤分期面临的挑战

Natural Language Processing in Dutch Free Text Radiology Reports: Challenges in a Small Language Area Staging Pulmonary Oncology.

作者信息

Nobel J Martijn, Puts Sander, Bakers Frans C H, Robben Simon G F, Dekker André L A J

机构信息

Department of Radiology and Nuclear Medicine, Maastricht University Medical Center+, Postbox 5800, 6202, Maastricht, AZ, Netherlands.

School of Health Professions Education, Maastricht University, Maastricht, Netherlands.

出版信息

J Digit Imaging. 2020 Aug;33(4):1002-1008. doi: 10.1007/s10278-020-00327-z.

Abstract

Reports are the standard way of communication between the radiologist and the referring clinician. Efforts are made to improve this communication by, for instance, introducing standardization and structured reporting. Natural Language Processing (NLP) is another promising tool which can improve and enhance the radiological report by processing free text. NLP as such adds structure to the report and exposes the information, which in turn can be used for further analysis. This paper describes pre-processing and processing steps and highlights important challenges to overcome in order to successfully implement a free text mining algorithm using NLP tools and machine learning in a small language area, like Dutch. A rule-based algorithm was constructed to classify T-stage of pulmonary oncology from the original free text radiological report, based on the items tumor size, presence and involvement according to the 8th TNM classification system. PyContextNLP, spaCy and regular expressions were used as tools to extract the correct information and process the free text. Overall accuracy of the algorithm for evaluating T-stage was 0,83 in the training set and 0,87 in the validation set, which shows that the approach in this pilot study is promising. Future research with larger datasets and external validation is needed to be able to introduce more machine learning approaches and perhaps to reduce required input efforts of domain-specific knowledge. However, a hybrid NLP approach will probably achieve the best results.

摘要

报告是放射科医生与转诊临床医生之间的标准沟通方式。人们通过引入标准化和结构化报告等方式努力改善这种沟通。自然语言处理(NLP)是另一种有前景的工具,它可以通过处理自由文本改进和完善放射学报告。NLP为报告增添了结构并揭示了信息,这些信息进而可用于进一步分析。本文描述了预处理和处理步骤,并强调了在荷兰语等小语种领域使用NLP工具和机器学习成功实施自由文本挖掘算法需要克服的重要挑战。基于第8版TNM分类系统中的肿瘤大小、存在情况和累及范围等项目,构建了一种基于规则的算法,用于从原始自由文本放射学报告中对肺肿瘤学的T分期进行分类。使用PyContextNLP、spaCy和正则表达式作为工具来提取正确信息并处理自由文本。该算法评估T分期的总体准确率在训练集中为0.83,在验证集中为0.87,这表明该初步研究中的方法很有前景。需要使用更大的数据集进行未来研究并进行外部验证,以便能够引入更多机器学习方法,并可能减少特定领域知识所需的输入工作量。然而,混合NLP方法可能会取得最佳效果。

相似文献

5
Information extraction from multi-institutional radiology reports.从多机构放射学报告中提取信息。
Artif Intell Med. 2016 Jan;66:29-39. doi: 10.1016/j.artmed.2015.09.007. Epub 2015 Oct 3.
7
Practical Guide to Natural Language Processing for Radiology.实用放射医学自然语言处理指南。
Radiographics. 2021 Sep-Oct;41(5):1446-1453. doi: 10.1148/rg.2021200113.

引用本文的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验