• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用自然语言处理从临床笔记中提取癌症概念:系统评价。

Extracting cancer concepts from clinical notes using natural language processing: a systematic review.

机构信息

Student Research Committee, Kerman University of Medical Sciences, Kerman, Iran.

Department of Health Information Sciences, Faculty of Management and Medical Information Sciences, Kerman University of Medical Sciences, Kerman, Iran.

出版信息

BMC Bioinformatics. 2023 Oct 29;24(1):405. doi: 10.1186/s12859-023-05480-0.

DOI:10.1186/s12859-023-05480-0
PMID:37898795
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10613366/
Abstract

BACKGROUND

Extracting information from free texts using natural language processing (NLP) can save time and reduce the hassle of manually extracting large quantities of data from incredibly complex clinical notes of cancer patients. This study aimed to systematically review studies that used NLP methods to identify cancer concepts from clinical notes automatically.

METHODS

PubMed, Scopus, Web of Science, and Embase were searched for English language papers using a combination of the terms concerning "Cancer", "NLP", "Coding", and "Registries" until June 29, 2021. Two reviewers independently assessed the eligibility of papers for inclusion in the review.

RESULTS

Most of the software programs used for concept extraction reported were developed by the researchers (n = 7). Rule-based algorithms were the most frequently used algorithms for developing these programs. In most articles, the criteria of accuracy (n = 14) and sensitivity (n = 12) were used to evaluate the algorithms. In addition, Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) and Unified Medical Language System (UMLS) were the most commonly used terminologies to identify concepts. Most studies focused on breast cancer (n = 4, 19%) and lung cancer (n = 4, 19%).

CONCLUSION

The use of NLP for extracting the concepts and symptoms of cancer has increased in recent years. The rule-based algorithms are well-liked algorithms by developers. Due to these algorithms' high accuracy and sensitivity in identifying and extracting cancer concepts, we suggested that future studies use these algorithms to extract the concepts of other diseases as well.

摘要

背景

使用自然语言处理(NLP)从自由文本中提取信息可以节省时间,并减少从癌症患者极其复杂的临床记录中手动提取大量数据的麻烦。本研究旨在系统地综述使用 NLP 方法自动从临床记录中识别癌症概念的研究。

方法

使用“癌症”、“NLP”、“编码”和“登记”等术语,结合组合词,在 PubMed、Scopus、Web of Science 和 Embase 中搜索英文文献,检索时间截至 2021 年 6 月 29 日。两名审查员独立评估纳入研究的论文的资格。

结果

用于概念提取的软件程序大多是由研究人员开发的(n=7)。规则算法是开发这些程序最常用的算法。在大多数文章中,使用准确性(n=14)和敏感性(n=12)标准来评估算法。此外,系统医学术语命名法-临床术语(SNOMED-CT)和统一医学语言系统(UMLS)是最常用于识别概念的术语。大多数研究都集中在乳腺癌(n=4,19%)和肺癌(n=4,19%)。

结论

近年来,使用 NLP 提取癌症概念和症状的应用有所增加。规则算法是开发人员喜欢的算法。由于这些算法在识别和提取癌症概念方面具有较高的准确性和敏感性,我们建议未来的研究也使用这些算法来提取其他疾病的概念。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9581/10613366/a2e1b2986d83/12859_2023_5480_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9581/10613366/c73c6524edd2/12859_2023_5480_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9581/10613366/cf055d934956/12859_2023_5480_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9581/10613366/a2e1b2986d83/12859_2023_5480_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9581/10613366/c73c6524edd2/12859_2023_5480_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9581/10613366/cf055d934956/12859_2023_5480_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9581/10613366/a2e1b2986d83/12859_2023_5480_Fig3_HTML.jpg

相似文献

1
Extracting cancer concepts from clinical notes using natural language processing: a systematic review.使用自然语言处理从临床笔记中提取癌症概念:系统评价。
BMC Bioinformatics. 2023 Oct 29;24(1):405. doi: 10.1186/s12859-023-05480-0.
2
Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives.开发和评估 RapTAT:一种用于从医学叙述中映射短语概念的机器学习系统。
J Biomed Inform. 2014 Apr;48:54-65. doi: 10.1016/j.jbi.2013.11.008. Epub 2013 Dec 4.
3
Use of "off-the-shelf" information extraction algorithms in clinical informatics: A feasibility study of MetaMap annotation of Italian medical notes.临床信息学中“现成可用”信息提取算法的应用:意大利医学记录的MetaMap注释可行性研究。
J Biomed Inform. 2016 Oct;63:22-32. doi: 10.1016/j.jbi.2016.07.017. Epub 2016 Jul 18.
4
Use of the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for Processing Free Text in Health Care: Systematic Scoping Review.系统医学术语命名法(SNOMED CT)在医疗保健中处理自由文本的应用:系统范围综述。
J Med Internet Res. 2021 Jan 26;23(1):e24594. doi: 10.2196/24594.
5
Data for registry and quality review can be retrospectively collected using natural language processing from unstructured charts of arthroplasty patients.可以使用自然语言处理从关节置换患者的非结构化图表中回顾性地收集注册和质量审查数据。
Bone Joint J. 2020 Jul;102-B(7_Supple_B):99-104. doi: 10.1302/0301-620X.102B7.BJJ-2019-1574.R1.
6
Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies.自然语言处理算法在将临床文本片段映射到本体概念上的应用:系统评价及对未来研究的建议。
J Biomed Semantics. 2020 Nov 16;11(1):14. doi: 10.1186/s13326-020-00231-z.
7
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
8
Knowledge Author: facilitating user-driven, domain content development to support clinical information extraction.知识作者:促进用户驱动的领域内容开发,以支持临床信息提取。
J Biomed Semantics. 2016 Jun 23;7(1):42. doi: 10.1186/s13326-016-0086-9.
9
Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing.使用自然语言处理从阿尔茨海默病患者的临床记录中提取睡眠信息。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2217-2227. doi: 10.1093/jamia/ocae177.
10
Machine translation of standardised medical terminology using natural language processing: A scoping review.基于自然语言处理的标准化医学术语机器翻译:范围综述。
N Biotechnol. 2023 Nov 25;77:120-129. doi: 10.1016/j.nbt.2023.08.004. Epub 2023 Aug 29.

引用本文的文献

1
Multimodal integration strategies for clinical application in oncology.肿瘤学临床应用中的多模态整合策略
Front Pharmacol. 2025 Aug 20;16:1609079. doi: 10.3389/fphar.2025.1609079. eCollection 2025.
2
Automated Extraction of Imaging and Pathology Data From Diverse Prostate Cancer Electronic Records.从多种前列腺癌电子记录中自动提取影像和病理数据
JCO Clin Cancer Inform. 2025 Aug;9:e2500085. doi: 10.1200/CCI-25-00085. Epub 2025 Aug 7.
3
Automated extraction of functional biomarkers of verbal and ambulatory ability from multi-institutional clinical notes using large language models.

本文引用的文献

1
Gradient Boosting Machine and Efficient Combination of Features for Speech-Based Detection of COVID-19.基于语音的 COVID-19 检测的梯度提升机和有效特征组合
IEEE J Biomed Health Inform. 2022 Nov;26(11):5364-5371. doi: 10.1109/JBHI.2022.3197910. Epub 2022 Nov 10.
2
Graph-based relevancy-redundancy gene selection method for cancer diagnosis.基于图的相关性-冗余基因选择方法用于癌症诊断。
Comput Biol Med. 2022 Aug;147:105766. doi: 10.1016/j.compbiomed.2022.105766. Epub 2022 Jun 27.
3
Automatic Classification of Cancer Pathology Reports: A Systematic Review.
使用大语言模型从多机构临床记录中自动提取言语和行动能力的功能生物标志物。
J Neurodev Disord. 2025 Apr 30;17(1):24. doi: 10.1186/s11689-025-09612-w.
4
Enhancing Bidirectional Encoder Representations From Transformers (BERT) With Frame Semantics to Extract Clinically Relevant Information From German Mammography Reports: Algorithm Development and Validation.利用框架语义增强来自变换器的双向编码器表征(BERT)以从德国乳腺钼靶报告中提取临床相关信息:算法开发与验证
J Med Internet Res. 2025 Apr 25;27:e68427. doi: 10.2196/68427.
5
Year 2023 in Biomedical Natural Language Processing: a Tribute to Large Language Models and Generative AI.2023年生物医学自然语言处理领域:向大语言模型和生成式人工智能致敬。
Yearb Med Inform. 2024 Aug;33(1):241-248. doi: 10.1055/s-0044-1800751. Epub 2025 Apr 8.
6
A Narrative Review on the Application of Large Language Models to Support Cancer Care and Research.关于应用大语言模型支持癌症护理与研究的叙述性综述。
Yearb Med Inform. 2024 Aug;33(1):90-98. doi: 10.1055/s-0044-1800726. Epub 2025 Apr 8.
7
Automated Identification of Breast Cancer Relapse in Computed Tomography Reports Using Natural Language Processing.使用自然语言处理技术在计算机断层扫描报告中自动识别乳腺癌复发情况
JCO Clin Cancer Inform. 2024 Dec;8:e2400107. doi: 10.1200/CCI.24.00107. Epub 2024 Dec 20.
8
A scoping review of large language model based approaches for information extraction from radiology reports.基于大语言模型从放射学报告中提取信息的方法的范围综述。
NPJ Digit Med. 2024 Aug 24;7(1):222. doi: 10.1038/s41746-024-01219-0.
9
Collecting routine and timely cancer stage at diagnosis by implementing a cancer staging tiered framework: the Western Australian Cancer Registry experience.通过实施癌症分期分层框架来收集常规和及时的诊断癌症分期:西澳大利亚癌症登记处的经验。
BMC Health Serv Res. 2024 Jun 28;24(1):770. doi: 10.1186/s12913-024-11224-4.
10
Comparative Performance of ChatGPT 3.5 and GPT4 on Rhinology Standardized Board Examination Questions.ChatGPT 3.5与GPT4在鼻科学标准化委员会考试问题上的比较表现
OTO Open. 2024 Jun 27;8(2):e164. doi: 10.1002/oto2.164. eCollection 2024 Apr-Jun.
癌症病理报告的自动分类:一项系统综述。
J Pathol Inform. 2022 Jan 20;13:100003. doi: 10.1016/j.jpi.2022.100003. eCollection 2022.
4
Deep learning conventional learning algorithms for clinical prediction in Crohn's disease: A proof-of-concept study.深度学习与传统学习算法在克罗恩病临床预测中的比较:一项概念验证研究。
World J Gastroenterol. 2021 Oct 14;27(38):6476-6488. doi: 10.3748/wjg.v27.i38.6476.
5
The use of SNOMED CT, 2013-2020: a literature review.SNOMED CT 的使用,2013-2020:文献综述。
J Am Med Inform Assoc. 2021 Aug 13;28(9):2017-2026. doi: 10.1093/jamia/ocab084.
6
A systematic review of natural language processing applied to radiology reports.自然语言处理在放射学报告中的应用的系统评价。
BMC Med Inform Decis Mak. 2021 Jun 3;21(1):179. doi: 10.1186/s12911-021-01533-7.
7
Machine Learning and Natural Language Processing in Mental Health: Systematic Review.机器学习和自然语言处理在心理健康中的应用:系统综述。
J Med Internet Res. 2021 May 4;23(5):e15708. doi: 10.2196/15708.
8
Automated classification of cancer morphology from Italian pathology reports using Natural Language Processing techniques: A rule-based approach.基于自然语言处理技术的意大利病理报告中癌症形态的自动分类:一种基于规则的方法。
J Biomed Inform. 2021 Apr;116:103712. doi: 10.1016/j.jbi.2021.103712. Epub 2021 Feb 18.
9
Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries.《全球癌症统计数据 2020:全球 185 个国家和地区 36 种癌症的发病率和死亡率估计》。
CA Cancer J Clin. 2021 May;71(3):209-249. doi: 10.3322/caac.21660. Epub 2021 Feb 4.
10
Transformation of Pathology Reports Into the Common Data Model With Oncology Module: Use Case for Colon Cancer.将病理学报告转化为带有肿瘤学模块的通用数据模型:结肠癌用例。
J Med Internet Res. 2020 Dec 9;22(12):e18526. doi: 10.2196/18526.