从西班牙语肿瘤病理学报告的自由文本中自动提取信息。

Automated extraction of information from free text of Spanish oncology pathology reports.

机构信息

Universidad Nacional de Colombia, Facultad de Medicina, Departamento de Patología, Bogotá, Colombia.

Quantil SAS. Bogotá, Colombia.

出版信息

Colomb Med (Cali). 2023 Mar 30;54(1):e2035300. doi: 10.25100/cm.v54i1.5300. eCollection 2023 Jan-Mar.

DOI:10.25100/cm.v54i1.5300

PMID:37614525

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10443791/

Abstract

BACKGROUND

Pathology reports are stored as unstructured, ungrammatical, fragmented, and abbreviated free text with linguistic variability among pathologists. For this reason, tumor information extraction requires a significant human effort. Recording data in an efficient and high-quality format is essential in implementing and establishing a hospital-based-cancer registry.

OBJECTIVE

This study aimed to describe implementing a natural language processing algorithm for oncology pathology reports.

METHODS

An algorithm was developed to process oncology pathology reports in Spanish to extract 20 medical descriptors. The approach is based on the successive coincidence of regular expressions.

RESULTS

The validation was performed with 140 pathological reports. The topography identification was performed manually by humans and the algorithm in all reports. The human identified morphology in 138 reports and by the algorithm in 137. The average fuzzy matching score was 68.3 for Topography and 89.5 for Morphology.

CONCLUSIONS

A preliminary algorithm validation against human extraction was performed over a small set of reports with satisfactory results. This shows that a regular-expression approach can accurately and precisely extract multiple specimen attributes from free-text Spanish pathology reports. Additionally, we developed a website to facilitate collaborative validation at a larger scale which may be helpful for future research on the subject.

摘要

背景

病理报告以非结构化、不合语法、碎片化和缩写的自由文本形式存储，病理学家之间存在语言差异。因此，肿瘤信息提取需要大量的人力。以高效和高质量的格式记录数据对于实施和建立基于医院的癌症登记至关重要。

目的

本研究旨在描述一种用于肿瘤病理学报告的自然语言处理算法的实现。

方法

开发了一种算法，用于处理西班牙语的肿瘤病理学报告，以提取 20 个医学描述符。该方法基于正则表达式的连续匹配。

结果

在 140 份病理报告中进行了验证。通过人工和算法在所有报告中进行了 topography 识别。人类在 138 份报告中识别了形态，而算法在 137 份报告中识别了形态。Topography 的平均模糊匹配分数为 68.3，Morphology 的平均模糊匹配分数为 89.5。

结论

对一小部分报告进行了针对人工提取的初步算法验证，结果令人满意。这表明，正则表达式方法可以从自由文本的西班牙语病理学报告中准确、精确地提取多个标本属性。此外，我们开发了一个网站，以方便更大规模的协作验证，这对于该主题的未来研究可能会有所帮助。

相似文献

Automated extraction of information from free text of Spanish oncology pathology reports.从西班牙语肿瘤病理学报告的自由文本中自动提取信息。

Colomb Med (Cali). 2023 Mar 30;54(1):e2035300. doi: 10.25100/cm.v54i1.5300. eCollection 2023 Jan-Mar.

Automated classification of cancer morphology from Italian pathology reports using Natural Language Processing techniques: A rule-based approach.基于自然语言处理技术的意大利病理报告中癌症形态的自动分类：一种基于规则的方法。

J Biomed Inform. 2021 Apr;116:103712. doi: 10.1016/j.jbi.2021.103712. Epub 2021 Feb 18.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Natural Language Processing for Surveillance of Cervical and Anal Cancer and Precancer: Algorithm Development and Split-Validation Study.用于宫颈癌和肛门癌及癌前病变监测的自然语言处理：算法开发与分割验证研究

JMIR Med Inform. 2020 Nov 3;8(11):e20826. doi: 10.2196/20826.

Validation of deep learning natural language processing algorithm for keyword extraction from pathology reports in electronic health records.深度学习自然语言处理算法在电子病历中从病理报告中提取关键词的验证。

Sci Rep. 2020 Nov 20;10(1):20265. doi: 10.1038/s41598-020-77258-w.

Data for registry and quality review can be retrospectively collected using natural language processing from unstructured charts of arthroplasty patients.可以使用自然语言处理从关节置换患者的非结构化图表中回顾性地收集注册和质量审查数据。

Bone Joint J. 2020 Jul;102-B(7_Supple_B):99-104. doi: 10.1302/0301-620X.102B7.BJJ-2019-1574.R1.

Identifying Patient Populations in Texts Describing Drug Approvals Through Deep Learning-Based Information Extraction: Development of a Natural Language Processing Algorithm.通过基于深度学习的信息提取在描述药物批准的文本中识别患者群体：一种自然语言处理算法的开发

JMIR Form Res. 2023 Jun 22;7:e44876. doi: 10.2196/44876.

A Question-and-Answer System to Extract Data From Free-Text Oncological Pathology Reports (CancerBERT Network): Development Study.从自由文本肿瘤病理学报告（CancerBERT 网络）中提取数据的问答系统：开发研究。

J Med Internet Res. 2022 Mar 23;24(3):e27210. doi: 10.2196/27210.

Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics.通过采用分类、命名实体识别和关系提取启发式方法的自然语言处理途径从病理报告中获取知识。

JCO Clin Cancer Inform. 2019 Aug;3:1-8. doi: 10.1200/CCI.19.00008.

[A customized method for information extraction from unstructured text data in the electronic medical records].[一种从电子病历非结构化文本数据中提取信息的定制方法]

Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):256-263.

引用本文的文献

Year 2023 in Biomedical Natural Language Processing: a Tribute to Large Language Models and Generative AI.2023年生物医学自然语言处理领域：向大语言模型和生成式人工智能致敬。

Yearb Med Inform. 2024 Aug;33(1):241-248. doi: 10.1055/s-0044-1800751. Epub 2025 Apr 8.

Structuring and centralizing breast cancer real-world biomarker data from pathology reports through C-LAB artificial intelligence platform.通过C-LAB人工智能平台构建并集中来自病理报告的乳腺癌真实世界生物标志物数据。

Digit Health. 2025 Feb 25;11:20552076251323110. doi: 10.1177/20552076251323110. eCollection 2025 Jan-Dec.

Developing and Validating an Automatic Support System for Tumor Coding in Pathology Reports in Spanish.开发并验证一个用于西班牙语病理报告中肿瘤编码的自动支持系统。

JCO Clin Cancer Inform. 2025 Feb;9:e2400124. doi: 10.1200/CCI.24.00124. Epub 2025 Feb 24.

Automatic Detection of Distant Metastasis Mentions in Radiology Reports in Spanish.自动检测西班牙语放射学报告中的远处转移提及。

JCO Clin Cancer Inform. 2024 Jan;8:e2300130. doi: 10.1200/CCI.23.00130.

本文引用的文献

Methods and implementation of a Hospital-Based Cancer Registry in a major city in a low-to middle-income country: the case of Cali, Colombia.中低收入国家一主要城市基于医院的癌症登记处的方法和实施：以哥伦比亚卡利为例。

Cancer Causes Control. 2022 Mar;33(3):381-392. doi: 10.1007/s10552-021-01532-z. Epub 2022 Jan 11.

J Biomed Inform. 2021 Apr;116:103712. doi: 10.1016/j.jbi.2021.103712. Epub 2021 Feb 18.

[Automatic keyword retrieval from clinical texts: an application of natural language processing to massive data of Chilean suspected diagnosis].

Rev Med Chil. 2019 Oct;147(10):1229-1238. doi: 10.4067/s0034-98872019001001229.

Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.使用多任务卷积神经网络从自由文本病理报告中自动提取癌症登记报告信息。

J Am Med Inform Assoc. 2020 Jan 1;27(1):89-98. doi: 10.1093/jamia/ocz153.

Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances.使用临床自然语言处理进行健康结果研究：未来进展的概述和可行建议。

J Biomed Inform. 2018 Dec;88:11-19. doi: 10.1016/j.jbi.2018.10.005. Epub 2018 Oct 24.

Automatic Detection of Negated Findings in Radiological Reports for Spanish Language: Methodology Based on Lexicon-Grammatical Information Processing.基于词汇-语法信息处理的西班牙语放射报告中否定发现的自动检测：方法。

J Digit Imaging. 2019 Feb;32(1):19-29. doi: 10.1007/s10278-018-0113-8.

Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry.从中央癌症登记处的非结构化文本字段中自动提取和分类癌症分期信息

AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:16-25. eCollection 2018.

Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review.用于捕获和标准化非结构化临床信息的自然语言处理系统：一项系统综述。

J Biomed Inform. 2017 Sep;73:14-29. doi: 10.1016/j.jbi.2017.07.012. Epub 2017 Jul 17.

Natural language processing in pathology: a scoping review.病理学中的自然语言处理：一项范围综述。

J Clin Pathol. 2016 Jul 22. doi: 10.1136/jclinpath-2016-203872.

Natural language processing: an introduction.自然语言处理：入门。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):544-51. doi: 10.1136/amiajnl-2011-000464.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

从西班牙语肿瘤病理学报告的自由文本中自动提取信息。

Automated extraction of information from free text of Spanish oncology pathology reports.

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献