Suppr超能文献

利用自然语言处理技术从大规模放射学报告中推断转移性疾病部位。

Use of Natural Language Processing to Infer Sites of Metastatic Disease From Radiology Reports at Scale.

机构信息

Division of Medical Oncology, National Cancer Centre Singapore, Singapore, Singapore.

NUS Yong Loo Lin School of Medicine, Singapore, Singapore.

出版信息

JCO Clin Cancer Inform. 2024 May;8:e2300122. doi: 10.1200/CCI.23.00122.

Abstract

PURPOSE

To evaluate natural language processing (NLP) methods to infer metastatic sites from radiology reports.

METHODS

A set of 4,522 computed tomography (CT) reports of 550 patients with 14 types of cancer was used to fine-tune four clinical large language models (LLMs) for multilabel classification of metastatic sites. We also developed an NLP information extraction (IE) system (on the basis of named entity recognition, assertion status detection, and relation extraction) for comparison. Model performances were measured by F1 scores on test and three external validation sets. The best model was used to facilitate analysis of metastatic frequencies in a cohort study of 6,555 patients with 53,838 CT reports.

RESULTS

The RadBERT, BioBERT, GatorTron-base, and GatorTron-medium LLMs achieved F1 scores of 0.84, 0.87, 0.89, and 0.91, respectively, on the test set. The IE system performed best, achieving an F1 score of 0.93. F1 scores of the IE system by individual cancer type ranged from 0.89 to 0.96. The IE system attained F1 scores of 0.89, 0.83, and 0.81, respectively, on external validation sets including additional cancer types, positron emission tomography-CT ,and magnetic resonance imaging scans, respectively. In our cohort study, we found that for colorectal cancer, liver-only metastases were higher in de novo stage IV versus recurrent patients (29.7% 12.2%; < .001). Conversely, lung-only metastases were more frequent in recurrent versus de novo stage IV patients (17.2% 7.3%; < .001).

CONCLUSION

We developed an IE system that accurately infers metastatic sites in multiple primary cancers from radiology reports. It has explainable methods and performs better than some clinical LLMs. The inferred metastatic phenotypes could enhance cancer research databases and clinical trial matching, and identify potential patients for oligometastatic interventions.

摘要

目的

评估自然语言处理(NLP)方法从放射学报告推断转移部位。

方法

使用一组 550 名 14 种癌症患者的 4522 份计算机断层扫描(CT)报告,对 4 种临床大语言模型(LLM)进行微调,以对转移部位进行多标签分类。我们还开发了一个基于命名实体识别、断言状态检测和关系提取的 NLP 信息提取(IE)系统进行比较。模型性能通过测试和三个外部验证集的 F1 分数进行衡量。使用最佳模型在一项 6555 名患者 53838 份 CT 报告的队列研究中促进转移频率的分析。

结果

RadBERT、BioBERT、GatorTron-base 和 GatorTron-medium LLM 在测试集上的 F1 分数分别为 0.84、0.87、0.89 和 0.91。IE 系统表现最佳,F1 得分为 0.93。IE 系统在单个癌症类型的 F1 得分范围为 0.89 至 0.96。IE 系统在包括其他癌症类型、正电子发射断层扫描-CT 和磁共振成像扫描的三个外部验证集上的 F1 分数分别为 0.89、0.83 和 0.81。在我们的队列研究中,我们发现对于结直肠癌,新发 IV 期与复发性患者的肝转移更高(29.7% 12.2%;<.001)。相反,复发性 IV 期患者的肺转移更常见(17.2% 7.3%;<.001)。

结论

我们开发了一种 IE 系统,该系统可从放射学报告中准确推断多种原发性癌症的转移部位。它具有可解释的方法,性能优于一些临床 LLM。推断出的转移表型可以增强癌症研究数据库和临床试验匹配,并识别出潜在的寡转移干预患者。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22d3/11371090/7fc0836958c7/cci-8-e2300122-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验