Institute for Systems and Computer Engineering, Technology and Science (INESC-TEC), Portugal; Faculty of Engineering of the University of Porto, Portugal.
Hospital Center of Vila Nova de Gaia/Espinho, Portugal.
Artif Intell Med. 2024 Mar;149:102814. doi: 10.1016/j.artmed.2024.102814. Epub 2024 Feb 14.
Machine Learning models need large amounts of annotated data for training. In the field of medical imaging, labeled data is especially difficult to obtain because the annotations have to be performed by qualified physicians. Natural Language Processing (NLP) tools can be applied to radiology reports to extract labels for medical images automatically. Compared to manual labeling, this approach requires smaller annotation efforts and can therefore facilitate the creation of labeled medical image data sets. In this article, we summarize the literature on this topic spanning from 2013 to 2023, starting with a meta-analysis of the included articles, followed by a qualitative and quantitative systematization of the results. Overall, we found four types of studies on the extraction of labels from radiology reports: those describing systems based on symbolic NLP, statistical NLP, neural NLP, and those describing systems combining or comparing two or more of the latter. Despite the large variety of existing approaches, there is still room for further improvement. This work can contribute to the development of new techniques or the improvement of existing ones.
机器学习模型需要大量标注数据进行训练。在医学成像领域,由于标注必须由合格的医生来完成,因此很难获得标记数据。自然语言处理 (NLP) 工具可应用于放射科报告,以自动提取医学图像的标签。与手动标注相比,这种方法需要较少的标注工作,因此可以方便地创建标记的医学图像数据集。在本文中,我们总结了 2013 年至 2023 年期间关于该主题的文献,首先对纳入的文章进行荟萃分析,然后对结果进行定性和定量系统化。总的来说,我们发现了四种从放射科报告中提取标签的研究类型:基于符号 NLP、统计 NLP、神经 NLP 的系统描述,以及描述两种或两种以上方法相结合或比较的系统的研究。尽管现有的方法种类繁多,但仍有进一步改进的空间。这项工作可以为新技术的开发或现有技术的改进做出贡献。