• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

了解自然语言处理工具的性能和可靠性:四种自然语言处理工具在放射学报告中预测中风表型的比较。

Understanding the performance and reliability of NLP tools: a comparison of four NLP tools predicting stroke phenotypes in radiology reports.

作者信息

Casey Arlene, Davidson Emma, Grover Claire, Tobin Richard, Grivas Andreas, Zhang Huayu, Schrempf Patrick, O'Neil Alison Q, Lee Liam, Walsh Michael, Pellie Freya, Ferguson Karen, Cvoro Vera, Wu Honghan, Whalley Heather, Mair Grant, Whiteley William, Alex Beatrice

机构信息

Advanced Care Research Centre, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom.

Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom.

出版信息

Front Digit Health. 2023 Sep 28;5:1184919. doi: 10.3389/fdgth.2023.1184919. eCollection 2023.

DOI:10.3389/fdgth.2023.1184919
PMID:37840686
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10569314/
Abstract

BACKGROUND

Natural language processing (NLP) has the potential to automate the reading of radiology reports, but there is a need to demonstrate that NLP methods are adaptable and reliable for use in real-world clinical applications.

METHODS

We tested the F1 score, precision, and recall to compare NLP tools on a cohort from a study on delirium using images and radiology reports from NHS Fife and a population-based cohort (Generation Scotland) that spans multiple National Health Service health boards. We compared four off-the-shelf rule-based and neural NLP tools (namely, EdIE-R, ALARM+, ESPRESSO, and Sem-EHR) and reported on their performance for three cerebrovascular phenotypes, namely, ischaemic stroke, small vessel disease (SVD), and atrophy. Clinical experts from the EdIE-R team defined phenotypes using labelling techniques developed in the development of EdIE-R, in conjunction with an expert researcher who read underlying images.

RESULTS

EdIE-R obtained the highest F1 score in both cohorts for ischaemic stroke, ≥93%, followed by ALARM+, ≥87%. The F1 score of ESPRESSO was ≥74%, whilst that of Sem-EHR is ≥66%, although ESPRESSO had the highest precision in both cohorts, 90% and 98%. For F1 scores for SVD, EdIE-R scored ≥98% and ALARM+ ≥90%. ESPRESSO scored lowest with ≥77% and Sem-EHR ≥81%. In NHS Fife, F1 scores for atrophy by EdIE-R and ALARM+ were 99%, dropping in Generation Scotland to 96% for EdIE-R and 91% for ALARM+. Sem-EHR performed lowest for atrophy at 89% in NHS Fife and 73% in Generation Scotland. When comparing NLP tool output with brain image reads using F1 scores, ALARM+ scored 80%, outperforming EdIE-R at 66% in ischaemic stroke. For SVD, EdIE-R performed best, scoring 84%, with Sem-EHR 82%. For atrophy, EdIE-R and both ALARM+ versions were comparable at 80%.

CONCLUSIONS

The four NLP tools show varying F1 (and precision/recall) scores across all three phenotypes, although more apparent for ischaemic stroke. If NLP tools are to be used in clinical settings, this cannot be performed "out of the box." It is essential to understand the context of their development to assess whether they are suitable for the task at hand or whether further training, re-training, or modification is required to adapt tools to the target task.

摘要

背景

自然语言处理(NLP)有潜力实现放射学报告阅读自动化,但有必要证明NLP方法在实际临床应用中具有适应性和可靠性。

方法

我们测试了F1分数、精确率和召回率,以比较NLP工具在一项关于谵妄的研究队列中的表现,该队列使用了来自NHS法夫郡的图像和放射学报告以及一个涵盖多个国民保健服务健康委员会的基于人群的队列(苏格兰世代研究)。我们比较了四种现成的基于规则和神经网络的NLP工具(即EdIE-R、ALARM+、ESPRESSO和Sem-EHR),并报告了它们在三种脑血管表型(即缺血性中风、小血管疾病(SVD)和萎缩)方面的性能。EdIE-R团队的临床专家使用在EdIE-R开发过程中开发的标记技术,并结合一位阅读基础图像的专家研究人员来定义表型。

结果

在两个队列中,EdIE-R在缺血性中风方面获得了最高的F1分数,≥93%,其次是ALARM+,≥87%。ESPRESSO的F1分数≥74%,而Sem-EHR的F1分数≥66%,尽管ESPRESSO在两个队列中的精确率最高,分别为90%和98%。对于SVD的F1分数,EdIE-R得分≥98%,ALARM+≥90%。ESPRESSO得分最低,≥77%,Sem-EHR≥81%。在NHS法夫郡,EdIE-R和ALARM+在萎缩方面的F1分数为99%,在苏格兰世代研究队列中,EdIE-R降至96%,ALARM+降至91%。Sem-EHR在萎缩方面表现最差,在NHS法夫郡为89%,在苏格兰世代研究队列中为73%。当使用F1分数将NLP工具输出与脑部图像解读进行比较时,在缺血性中风方面,ALARM+得分为80%,优于EdIE-R的66%。对于SVD,EdIE-R表现最佳,得分为84%,Sem-EHR为82%。对于萎缩,EdIE-R和两个版本的ALARM+相当,均为80%。

结论

这四种NLP工具在所有三种表型上的F1(以及精确率/召回率)分数各不相同,尽管在缺血性中风方面更为明显。如果要在临床环境中使用NLP工具,不能“开箱即用”。必须了解其开发背景,以评估它们是否适合手头的任务,或者是否需要进一步训练、重新训练或修改,以使工具适应目标任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4543/10569314/58f01c3aa037/fdgth-05-1184919-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4543/10569314/88a4e0b973a5/fdgth-05-1184919-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4543/10569314/92999cc6f611/fdgth-05-1184919-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4543/10569314/62ccff6ff44e/fdgth-05-1184919-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4543/10569314/7335ff88eedd/fdgth-05-1184919-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4543/10569314/58f01c3aa037/fdgth-05-1184919-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4543/10569314/88a4e0b973a5/fdgth-05-1184919-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4543/10569314/92999cc6f611/fdgth-05-1184919-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4543/10569314/62ccff6ff44e/fdgth-05-1184919-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4543/10569314/7335ff88eedd/fdgth-05-1184919-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4543/10569314/58f01c3aa037/fdgth-05-1184919-g005.jpg

相似文献

1
Understanding the performance and reliability of NLP tools: a comparison of four NLP tools predicting stroke phenotypes in radiology reports.了解自然语言处理工具的性能和可靠性:四种自然语言处理工具在放射学报告中预测中风表型的比较。
Front Digit Health. 2023 Sep 28;5:1184919. doi: 10.3389/fdgth.2023.1184919. eCollection 2023.
2
Text mining brain imaging reports.文本挖掘脑成像报告。
J Biomed Semantics. 2019 Nov 12;10(Suppl 1):23. doi: 10.1186/s13326-019-0211-7.
3
A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records.从英国电子健康记录中的放射学报告中提取脑影像学表型的验证自然语言处理算法。
BMC Med Inform Decis Mak. 2019 Sep 9;19(1):184. doi: 10.1186/s12911-019-0908-7.
4
Mining Clinical Notes for Physical Rehabilitation Exercise Information: Natural Language Processing Algorithm Development and Validation Study.挖掘临床记录中的物理康复锻炼信息:自然语言处理算法的开发与验证研究
JMIR Med Inform. 2024 Apr 3;12:e52289. doi: 10.2196/52289.
5
Near Real-time Natural Language Processing for the Extraction of Abdominal Aortic Aneurysm Diagnoses From Radiology Reports: Algorithm Development and Validation Study.用于从放射学报告中提取腹主动脉瘤诊断的近实时自然语言处理:算法开发与验证研究
JMIR Med Inform. 2023 Feb 24;11:e40964. doi: 10.2196/40964.
6
Ensembles of natural language processing systems for portable phenotyping solutions.用于便携表型解决方案的自然语言处理系统集合。
J Biomed Inform. 2019 Dec;100:103318. doi: 10.1016/j.jbi.2019.103318. Epub 2019 Oct 23.
7
The reporting quality of natural language processing studies: systematic review of studies of radiology reports.自然语言处理研究报告的质量:对放射学报告研究的系统评价。
BMC Med Imaging. 2021 Oct 2;21(1):142. doi: 10.1186/s12880-021-00671-8.
8
Identification of Long Bone Fractures in Radiology Reports Using Natural Language Processing to support Healthcare Quality Improvement.利用自然语言处理技术识别放射学报告中的长骨骨折以支持医疗质量改进
Appl Clin Inform. 2016 Nov 9;7(4):1051-1068. doi: 10.4338/ACI-2016-08-RA-0129.
9
Natural language processing of radiology reports for identification of skeletal site-specific fractures.放射科报告的自然语言处理以识别骨骼部位特异性骨折。
BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):73. doi: 10.1186/s12911-019-0780-5.
10
Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports.用于从神经影像报告中识别无症状脑梗死的自然语言处理
JMIR Med Inform. 2019 Apr 21;7(2):e12109. doi: 10.2196/12109.

引用本文的文献

1
Evaluating the Performance and Bias of Natural Language Processing Tools in Labeling Chest Radiograph Reports.评估自然语言处理工具在标注胸部 X 光报告中的性能和偏差。
Radiology. 2024 Oct;313(1):e232746. doi: 10.1148/radiol.232746.

本文引用的文献

1
A survey on clinical natural language processing in the United Kingdom from 2007 to 2022.2007年至2022年英国临床自然语言处理调查。
NPJ Digit Med. 2022 Dec 21;5(1):186. doi: 10.1038/s41746-022-00730-6.
2
Developing automated methods for disease subtyping in UK Biobank: an exemplar study on stroke.开发英国生物库中疾病亚型的自动化方法:以中风为例的研究。
BMC Med Inform Decis Mak. 2021 Jun 15;21(1):191. doi: 10.1186/s12911-021-01556-0.
3
A systematic review of natural language processing applied to radiology reports.自然语言处理在放射学报告中的应用的系统评价。
BMC Med Inform Decis Mak. 2021 Jun 3;21(1):179. doi: 10.1186/s12911-021-01533-7.
4
Between Always and Never: Evaluating Uncertainty in Radiology Reports Using Natural Language Processing.在“总是”和“从不”之间:使用自然语言处理评估放射学报告中的不确定性。
J Digit Imaging. 2020 Oct;33(5):1194-1201. doi: 10.1007/s10278-020-00379-1. Epub 2020 Aug 19.
5
tbiExtractor: A framework for extracting traumatic brain injury common data elements from radiology reports.tbiExtractor:从放射学报告中提取创伤性脑损伤常见数据元素的框架。
PLoS One. 2020 Jul 1;15(7):e0214775. doi: 10.1371/journal.pone.0214775. eCollection 2020.
6
Assessment of the impact of EHR heterogeneity for clinical research through a case study of silent brain infarction.通过对沉默性脑梗死的案例研究评估电子健康记录异质性对临床研究的影响。
BMC Med Inform Decis Mak. 2020 Mar 30;20(1):60. doi: 10.1186/s12911-020-1072-9.
7
Text mining brain imaging reports.文本挖掘脑成像报告。
J Biomed Semantics. 2019 Nov 12;10(Suppl 1):23. doi: 10.1186/s13326-019-0211-7.
8
A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records.从英国电子健康记录中的放射学报告中提取脑影像学表型的验证自然语言处理算法。
BMC Med Inform Decis Mak. 2019 Sep 9;19(1):184. doi: 10.1186/s12911-019-0908-7.
9
Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports.用于从神经影像报告中识别无症状脑梗死的自然语言处理
JMIR Med Inform. 2019 Apr 21;7(2):e12109. doi: 10.2196/12109.
10
Determining Adherence to Follow-up Imaging Recommendations.确定对随访影像学建议的依从性。
J Am Coll Radiol. 2018 Mar;15(3 Pt A):422-428. doi: 10.1016/j.jacr.2017.11.022.