• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

临床自然语言处理中的可重复性能否提高?对 7 个临床自然语言处理套件的研究。

Can reproducibility be improved in clinical natural language processing? A study of 7 clinical NLP suites.

机构信息

INSERM, Centre de Recherche des Cordeliers, UMRS 1138, Université de Paris, Université Sorbonne Paris Cité, Paris, France.

Department of Medical Informatics, Hôpital Européen Georges Pompidou, Assistance publique-Hôpitaux de Paris, Paris, France.

出版信息

J Am Med Inform Assoc. 2021 Mar 1;28(3):504-515. doi: 10.1093/jamia/ocaa261.

DOI:10.1093/jamia/ocaa261
PMID:33319904
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7936396/
Abstract

BACKGROUND

The increasing complexity of data streams and computational processes in modern clinical health information systems makes reproducibility challenging. Clinical natural language processing (NLP) pipelines are routinely leveraged for the secondary use of data. Workflow management systems (WMS) have been widely used in bioinformatics to handle the reproducibility bottleneck.

OBJECTIVE

To evaluate if WMS and other bioinformatics practices could impact the reproducibility of clinical NLP frameworks.

MATERIALS AND METHODS

Based on the literature across multiple researcho fields (NLP, bioinformatics and clinical informatics) we selected articles which (1) review reproducibility practices and (2) highlight a set of rules or guidelines to ensure tool or pipeline reproducibility. We aggregate insight from the literature to define reproducibility recommendations. Finally, we assess the compliance of 7 NLP frameworks to the recommendations.

RESULTS

We identified 40 reproducibility features from 8 selected articles. Frameworks based on WMS match more than 50% of features (26 features for LAPPS Grid, 22 features for OpenMinted) compared to 18 features for current clinical NLP framework (cTakes, CLAMP) and 17 features for GATE, ScispaCy, and Textflows.

DISCUSSION

34 recommendations are endorsed by at least 2 articles from our selection. Overall, 15 features were adopted by every NLP Framework. Nevertheless, frameworks based on WMS had a better compliance with the features.

CONCLUSION

NLP frameworks could benefit from lessons learned from the bioinformatics field (eg, public repositories of curated tools and workflows or use of containers for shareability) to enhance the reproducibility in a clinical setting.

摘要

背景

现代临床健康信息系统中数据流和计算过程日益复杂,使得可重复性成为挑战。临床自然语言处理 (NLP) 管道通常被用于数据的二次利用。工作流管理系统 (WMS) 已广泛应用于生物信息学,以解决可重复性瓶颈问题。

目的

评估 WMS 和其他生物信息学实践是否会影响临床 NLP 框架的可重复性。

材料和方法

根据跨多个研究领域(NLP、生物信息学和临床信息学)的文献,我们选择了以下文章:(1) 综述可重复性实践,(2) 强调了一组规则或指南,以确保工具或管道的可重复性。我们从文献中收集见解,以定义可重复性建议。最后,我们评估了 7 个 NLP 框架对这些建议的遵从情况。

结果

我们从 8 篇选定的文章中确定了 40 个可重复性特征。基于 WMS 的框架符合 50%以上的特征(LAPPS Grid 有 26 个特征,OpenMinted 有 22 个特征),而当前临床 NLP 框架(cTakes、CLAMP)符合 18 个特征,GATE、ScispaCy 和 Textflows 符合 17 个特征。

讨论

我们的选择中有至少 2 篇文章支持 34 条建议。总体而言,每个 NLP 框架都采用了 15 个特征。然而,基于 WMS 的框架与这些特征的一致性更好。

结论

NLP 框架可以从生物信息学领域吸取经验教训(例如,经过审核的工具和工作流程的公共存储库,或使用容器进行共享),以提高临床环境中的可重复性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36fe/7936396/b7c43e3c82b4/ocaa261f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36fe/7936396/2460828b1faa/ocaa261f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36fe/7936396/b7c43e3c82b4/ocaa261f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36fe/7936396/2460828b1faa/ocaa261f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36fe/7936396/b7c43e3c82b4/ocaa261f2.jpg

相似文献

1
Can reproducibility be improved in clinical natural language processing? A study of 7 clinical NLP suites.临床自然语言处理中的可重复性能否提高?对 7 个临床自然语言处理套件的研究。
J Am Med Inform Assoc. 2021 Mar 1;28(3):504-515. doi: 10.1093/jamia/ocaa261.
2
Ensembles of natural language processing systems for portable phenotyping solutions.用于便携表型解决方案的自然语言处理系统集合。
J Biomed Inform. 2019 Dec;100:103318. doi: 10.1016/j.jbi.2019.103318. Epub 2019 Oct 23.
3
Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances.使用临床自然语言处理进行健康结果研究:未来进展的概述和可行建议。
J Biomed Inform. 2018 Dec;88:11-19. doi: 10.1016/j.jbi.2018.10.005. Epub 2018 Oct 24.
4
Development and Validation of a Natural Language Processing Tool to Generate the CONSORT Reporting Checklist for Randomized Clinical Trials.开发和验证一种自然语言处理工具,用于生成随机临床试验的 CONSORT 报告清单。
JAMA Netw Open. 2020 Oct 1;3(10):e2014661. doi: 10.1001/jamanetworkopen.2020.14661.
5
Natural language processing: an introduction.自然语言处理:入门。
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):544-51. doi: 10.1136/amiajnl-2011-000464.
6
Trie-based rule processing for clinical NLP: A use-case study of n-trie, making the ConText algorithm more efficient and scalable.基于 Trie 的规则处理在临床自然语言处理中的应用:n-trie 的使用案例研究,使 ConText 算法更高效、更具可扩展性。
J Biomed Inform. 2018 Sep;85:106-113. doi: 10.1016/j.jbi.2018.08.002. Epub 2018 Aug 6.
7
Natural language processing for clinical notes in dentistry: A systematic review.牙科临床记录的自然语言处理:一项系统综述。
J Biomed Inform. 2023 Feb;138:104282. doi: 10.1016/j.jbi.2023.104282. Epub 2023 Jan 7.
8
Deep learning in clinical natural language processing: a methodical review.深度学习在临床自然语言处理中的应用:系统综述。
J Am Med Inform Assoc. 2020 Mar 1;27(3):457-470. doi: 10.1093/jamia/ocz200.
9
Common data model for natural language processing based on two existing standard information models: CDA+GrAF.基于两个现有标准信息模型的自然语言处理通用数据模型:CDA+GrAF。
J Biomed Inform. 2012 Aug;45(4):703-10. doi: 10.1016/j.jbi.2011.11.018. Epub 2011 Dec 8.
10
Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder.自然语言处理(NLP)工具在从研究文章中提取生物医学概念中的应用:以自闭症谱系障碍为例。
BMC Med Inform Decis Mak. 2020 Dec 30;20(Suppl 11):322. doi: 10.1186/s12911-020-01352-2.

引用本文的文献

1
A Data-Driven Paradigm for a Resilient and Sustainable Integrated Health Information Systems for Health Care Applications.一种用于医疗保健应用的具有弹性和可持续性的综合健康信息系统的数据驱动范式。
J Multidiscip Healthc. 2023 Dec 12;16:4015-4025. doi: 10.2147/JMDH.S433299. eCollection 2023.
2
Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database.用于整理经胸超声心动图(TTE)数据库的自然语言处理系统的开发与评估
Bioengineering (Basel). 2023 Nov 10;10(11):1307. doi: 10.3390/bioengineering10111307.
3
A cross-institutional evaluation on breast cancer phenotyping NLP algorithms on electronic health records.

本文引用的文献

1
When computational pipelines go 'clank'.当计算流程出现故障时。
Nat Methods. 2020 Jul;17(7):659-662. doi: 10.1038/s41592-020-0886-9.
2
The nf-core framework for community-curated bioinformatics pipelines.用于社区策划生物信息学流程的nf-core框架。
Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x.
3
Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv.共享可互操作的工作流溯源:最佳实践综述及其在 CWLProv 中的实际应用。
一项关于电子健康记录中乳腺癌表型自然语言处理算法的跨机构评估。
Comput Struct Biotechnol J. 2023 Aug 22;22:32-40. doi: 10.1016/j.csbj.2023.08.018. eCollection 2023.
4
TRESTLE: Toolkit for Reproducible Execution of Speech, Text and Language Experiments.TRESTLE:语音、文本和语言实验可重复执行工具包。
AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:360-369. eCollection 2023.
5
Representing and utilizing clinical textual data for real world studies: An OHDSI approach.用于真实世界研究的临床文本数据表示和利用:OHDSI 方法。
J Biomed Inform. 2023 Jun;142:104343. doi: 10.1016/j.jbi.2023.104343. Epub 2023 Mar 17.
6
BIONDA: a free database for a fast information on published biomarkers.BIONDA:一个免费数据库,可快速获取已发表生物标志物的信息。
Bioinform Adv. 2021 Aug 18;1(1):vbab015. doi: 10.1093/bioadv/vbab015. eCollection 2021.
7
Transforming epilepsy research: A systematic review on natural language processing applications.转化癫痫研究:自然语言处理应用的系统评价。
Epilepsia. 2023 Feb;64(2):292-305. doi: 10.1111/epi.17474. Epub 2022 Dec 19.
8
The role of machine learning in developing non-magnetic resonance imaging based biomarkers for multiple sclerosis: a systematic review.机器学习在开发基于非磁共振成像的多发性硬化症生物标志物中的作用:系统评价。
BMC Med Inform Decis Mak. 2022 Sep 15;22(1):242. doi: 10.1186/s12911-022-01985-5.
9
The Implication of Latent Information Quality to the Reproducibility of Secondary Use of Electronic Health Records.潜在信息质量对电子健康记录二次利用可重复性的影响。
Stud Health Technol Inform. 2022 Jun 6;290:173-177. doi: 10.3233/SHTI220055.
10
Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python.医学 spaCy:Python 中的新型临床文本处理工具包,助力临床应用。
AMIA Annu Symp Proc. 2022 Feb 21;2021:438-447. eCollection 2021.
Gigascience. 2019 Nov 1;8(11). doi: 10.1093/gigascience/giz095.
4
The journey to transparency, reproducibility, and replicability.通往透明度、可重复性和可再现性的征程。
J Am Med Inform Assoc. 2019 Mar 1;26(3):185-187. doi: 10.1093/jamia/ocz007.
5
Ten simple rules for documenting scientific software.记录科学软件的十条简单规则。
PLoS Comput Biol. 2018 Dec 20;14(12):e1006561. doi: 10.1371/journal.pcbi.1006561. eCollection 2018 Dec.
6
Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances.使用临床自然语言处理进行健康结果研究:未来进展的概述和可行建议。
J Biomed Inform. 2018 Dec;88:11-19. doi: 10.1016/j.jbi.2018.10.005. Epub 2018 Oct 24.
7
Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task.从 Twitter 上获取药物相关文本分类和概念规范化的数据和系统:来自社交媒体挖掘健康(SMM4H)-2017 共享任务的见解。
J Am Med Inform Assoc. 2018 Oct 1;25(10):1274-1283. doi: 10.1093/jamia/ocy114.
8
PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation.PRISMA 扩展用于范围审查 (PRISMA-ScR): 清单和解释。
Ann Intern Med. 2018 Oct 2;169(7):467-473. doi: 10.7326/M18-0850. Epub 2018 Sep 4.
9
Three Dimensions of Reproducibility in Natural Language Processing.自然语言处理中可重复性的三个维度
LREC Int Conf Lang Resour Eval. 2018 May;2018:156-165.
10
ProvCaRe Semantic Provenance Knowledgebase: Evaluating Scientific Reproducibility of Research Studies.ProvCaRe语义溯源知识库:评估研究的科学可重复性。
AMIA Annu Symp Proc. 2018 Apr 16;2017:1705-1714. eCollection 2017.