Suppr超能文献

TECRR:一个基于机器学习、深度学习和大语言模型基线的用于 BI-RADS 分类的放射学报告基准数据集。

TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines.

机构信息

School of Engineering and Sciences, Tecnológico de Monterrey, Monterrey, 64849, Nuevo Leon, Mexico.

School of Computing, Macquarie University, Sydney, 2109, NSW, Australia.

出版信息

BMC Med Inform Decis Mak. 2024 Oct 24;24(1):310. doi: 10.1186/s12911-024-02717-7.

Abstract

BACKGROUND

Recently, machine learning (ML), deep learning (DL), and natural language processing (NLP) have provided promising results in the free-form radiological reports' classification in the respective medical domain. In order to classify radiological reports properly, a high-quality annotated and curated dataset is required. Currently, no publicly available breast imaging-based radiological dataset exists for the classification of Breast Imaging Reporting and Data System (BI-RADS) categories and breast density scores, as characterized by the American College of Radiology (ACR). To tackle this problem, we construct and annotate a breast imaging-based radiological reports dataset and its benchmark results. The dataset was originally in Spanish. Board-certified radiologists collected and annotated it according to the BI-RADS lexicon and categories at the Breast Radiology department, TecSalud Hospitals Monterrey, Mexico. Initially, it was translated into English language using Google Translate. Afterwards, it was preprocessed by removing duplicates and missing values. After preprocessing, the final dataset consists of 5046 unique reports from 5046 patients with an average age of 53 years and 100% women. Furthermore, we used word-level NLP-based embedding techniques, term frequency-inverse document frequency (TF-IDF) and word2vec to extract semantic and syntactic information. We also compared the performance of ML, DL and large language models (LLMs) classifiers for BI-RADS category classification.

RESULTS

The final breast imaging-based radiological reports dataset contains 5046 unique reports. We compared K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient-Boosting (GB), Extreme Gradient Boosting (XGB), Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT) and Biomedical Generative Pre-trained Transformer (BioGPT) classifiers. It is observed that the BioGPT classifier with preprocessed data performed 6% better with a mean sensitivity of 0.60 (95% confidence interval (CI), 0.391-0.812) compared to the second best performing classifier BERT, which achieved mean sensitivity of 0.54 (95% CI, 0.477-0.607).

CONCLUSION

In this work, we propose a curated and annotated benchmark dataset that can be used for BI-RADS and breast density category classification. We also provide baseline results of most ML, DL and LLMs models for BI-RADS classification that can be used as a starting point for future investigation. The main objective of this investigation is to provide a repository for the investigators who wish to enter the field to push the boundaries further.

摘要

背景

最近,机器学习(ML)、深度学习(DL)和自然语言处理(NLP)在各自的医学领域中为自由格式放射报告的分类提供了有前景的结果。为了正确地对放射报告进行分类,需要高质量的标注和整理数据集。目前,还没有可用于 BI-RADS 类别和乳房密度评分分类的基于乳房成像的放射报告公共数据集,这些数据集由美国放射学院(ACR)定义。为了解决这个问题,我们构建并标注了一个基于乳房成像的放射报告数据集及其基准结果。该数据集最初是西班牙语的。在墨西哥蒙特雷的 TecSalud 医院的乳腺放射科,经过董事会认证的放射科医生根据 BI-RADS 词汇表和类别进行了收集和标注。最初,它使用谷歌翻译翻译成英语。之后,通过删除重复项和缺失值对其进行预处理。预处理后,最终数据集包含 5046 名来自 5046 名患者的独特报告,平均年龄为 53 岁,100%为女性。此外,我们使用基于词级的 NLP 嵌入技术、词频-逆文档频率(TF-IDF)和 word2vec 来提取语义和句法信息。我们还比较了 ML、DL 和大型语言模型(LLM)分类器在 BI-RADS 类别分类方面的性能。

结果

最终的基于乳房成像的放射报告数据集包含 5046 个独特的报告。我们比较了 K-最近邻(KNN)、支持向量机(SVM)、朴素贝叶斯(NB)、随机森林(RF)、自适应提升(AdaBoost)、梯度提升(GB)、极端梯度提升(XGB)、长短期记忆(LSTM)、双向转换器编码器表示(BERT)和生物生成预训练转换器(BioGPT)分类器。观察到,与第二好的分类器 BERT 相比,经过预处理的数据的 BioGPT 分类器的平均灵敏度提高了 6%,达到 0.60(95%置信区间(CI),0.391-0.812),而 BERT 的平均灵敏度为 0.54(95%CI,0.477-0.607)。

结论

在这项工作中,我们提出了一个经过整理和标注的基准数据集,可用于 BI-RADS 和乳房密度分类。我们还提供了最 ML、DL 和 LLM 模型用于 BI-RADS 分类的基准结果,可作为未来研究的起点。这项研究的主要目的是为希望进入该领域的研究人员提供一个存储库,以进一步推动研究的发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e570/11515610/0ebbf92f7e75/12911_2024_2717_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验