利用自然语言处理从外科病理报告中提取和分类甲状腺乳头状癌特征。

Use of Natural Language Processing to Extract and Classify Papillary Thyroid Cancer Features From Surgical Pathology Reports.

机构信息

Knowledge and Evaluation Research Unit, Division of Endocrinology, Diabetes, Metabolism, and Nutrition, Department of Medicine, Mayo Clinic, Rochester, Minnesota.

Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota.

出版信息

Endocr Pract. 2024 Nov;30(11):1051-1058. doi: 10.1016/j.eprac.2024.08.008. Epub 2024 Aug 26.

DOI:10.1016/j.eprac.2024.08.008

PMID:39197747

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11531997/

Abstract

BACKGROUND

We aim to use Natural Language Processing to automate the extraction and classification of thyroid cancer risk factors from pathology reports.

METHODS

We analyzed 1410 surgical pathology reports from adult papillary thyroid cancer patients from 2010 to 2019. Structured and nonstructured reports were used to create a consensus-based ground truth dictionary and categorized them into modified recurrence risk levels. Nonstructured reports were narrative, while structured reports followed standardized formats. We developed ThyroPath, a rule-based Natural Language Processing pipeline, to extract and classify thyroid cancer features into risk categories. Training involved 225 reports (150 structured, 75 unstructured), with testing on 170 reports (120 structured, 50 unstructured) for evaluation. The pipeline's performance was assessed using both strict and lenient criteria for accuracy, precision, recall, and F1-score; a metric that combines precision and recall evaluation.

RESULTS

In extraction tasks, ThyroPath achieved overall strict F-1 scores of 93% for structured reports and 90% for unstructured reports, covering 18 thyroid cancer pathology features. In classification tasks, ThyroPath-extracted information demonstrated an overall accuracy of 93% in categorizing reports based on their corresponding guideline-based risk of recurrence: 76.9% for high-risk, 86.8% for intermediate risk, and 100% for both low and very low-risk cases. However, ThyroPath achieved 100% accuracy across all risk categories with human extracted pathology information.

CONCLUSIONS

ThyroPath shows promise in automating the extraction and risk recurrence classification of thyroid pathology reports at large scale. It offers a solution to laborious manual reviews and advancing virtual registries. However, it requires further validation before implementation.

摘要

背景

我们旨在使用自然语言处理技术从病理报告中自动提取和分类甲状腺癌风险因素。

方法

我们分析了 2010 年至 2019 年间 1410 例成人甲状腺乳头状癌患者的外科病理报告。使用结构化和非结构化报告创建了一个基于共识的基础字典，并将其分类为修改后的复发风险水平。非结构化报告为叙述性，而结构化报告则遵循标准化格式。我们开发了 ThyroPath，这是一个基于规则的自然语言处理管道，用于将甲状腺癌特征提取并分类到风险类别中。训练涉及 225 份报告（150 份结构化，75 份非结构化），170 份报告（120 份结构化，50 份非结构化）用于评估。使用严格和宽松的准确性、精度、召回率和 F1 评分标准评估管道的性能；这是一种结合精度和召回率评估的指标。

结果

在提取任务中，ThyroPath 在结构化报告中的总体严格 F1 评分为 93%，在非结构化报告中的总体严格 F1 评分为 90%，涵盖了 18 种甲状腺癌病理特征。在分类任务中，根据指南确定的复发风险，ThyroPath 提取的信息对报告进行分类的准确率为 93%：高风险为 76.9%，中风险为 86.8%，低风险和极低风险均为 100%。然而，使用人工提取的病理信息，ThyroPath 在所有风险类别中均达到了 100%的准确率。

结论

ThyroPath 在大规模自动提取和分类甲状腺病理报告中的风险复发方面具有很大的应用前景。它为繁琐的手动审查和推进虚拟登记提供了一种解决方案。然而，在实施之前，它需要进一步验证。

相似文献

Use of Natural Language Processing to Extract and Classify Papillary Thyroid Cancer Features From Surgical Pathology Reports.利用自然语言处理从外科病理报告中提取和分类甲状腺乳头状癌特征。

Endocr Pract. 2024 Nov;30(11):1051-1058. doi: 10.1016/j.eprac.2024.08.008. Epub 2024 Aug 26.

Natural Language Processing for Surveillance of Cervical and Anal Cancer and Precancer: Algorithm Development and Split-Validation Study.用于宫颈癌和肛门癌及癌前病变监测的自然语言处理：算法开发与分割验证研究

JMIR Med Inform. 2020 Nov 3;8(11):e20826. doi: 10.2196/20826.

Automated classification of cancer morphology from Italian pathology reports using Natural Language Processing techniques: A rule-based approach.基于自然语言处理技术的意大利病理报告中癌症形态的自动分类：一种基于规则的方法。

J Biomed Inform. 2021 Apr;116:103712. doi: 10.1016/j.jbi.2021.103712. Epub 2021 Feb 18.

Programming techniques for improving rule readability for rule-based information extraction natural language processing pipelines of unstructured and semi-structured medical texts.用于改进基于规则的信息抽取自然语言处理管道的规则可读性的编程技术，这些管道处理非结构化和半结构化的医学文本。

Health Informatics J. 2023 Apr-Jun;29(2):14604582231164696. doi: 10.1177/14604582231164696.

Leveraging Natural Language Processing to Extract Features of Colorectal Polyps From Pathology Reports for Epidemiologic Study.利用自然语言处理技术从病理学报告中提取结直肠息肉特征用于流行病学研究。

JCO Clin Cancer Inform. 2023 Jan;7:e2200131. doi: 10.1200/CCI.22.00131.

Automated Extraction of Tumor Staging and Diagnosis Information From Surgical Pathology Reports.从外科病理学报告中自动提取肿瘤分期和诊断信息。

JCO Clin Cancer Inform. 2021 Oct;5:1054-1061. doi: 10.1200/CCI.21.00065.

Pathology diagnosis of intraoperative frozen thyroid lesions assisted by deep learning.深度学习辅助下的甲状腺术中冰冻切片病变的病理学诊断。

BMC Cancer. 2024 Aug 29;24(1):1069. doi: 10.1186/s12885-024-12849-8.

Development of a generalizable natural language processing pipeline to extract physician-reported pain from clinical reports: Generated using publicly-available datasets and tested on institutional clinical reports for cancer patients with bone metastases.开发一种可推广的自然语言处理管道，从临床报告中提取医生报告的疼痛：使用公开可用的数据集生成，并在患有骨转移的癌症患者的机构临床报告上进行测试。

J Biomed Inform. 2021 Aug;120:103864. doi: 10.1016/j.jbi.2021.103864. Epub 2021 Jul 12.

An International Interobserver Variability Reporting of the Nuclear Scoring Criteria to Diagnose Noninvasive Follicular Thyroid Neoplasm with Papillary-Like Nuclear Features: a Validation Study.国际核评分标准诊断具有甲状腺滤泡状肿瘤样核特征的非侵袭性滤泡性甲状腺肿瘤的观察者间变异性报告：一项验证研究。

Endocr Pathol. 2018 Sep;29(3):242-249. doi: 10.1007/s12022-018-9520-0.

Long-term progression of non-invasive follicular thyroid neoplasm with papillary-like nuclear features: A single-center retrospective study of the French Marne-Ardennes thyroid cancer registry.具有甲状腺滤泡状肿瘤特征的非侵袭性甲状腺滤泡状肿瘤的长期进展：法国马恩-阿登甲状腺癌登记处的单中心回顾性研究。

Ann Endocrinol (Paris). 2020 Feb;81(1):34-38. doi: 10.1016/j.ando.2019.12.001. Epub 2020 Jan 25.

引用本文的文献

Developing a named entity framework for thyroid cancer staging and risk level classification using large language models.使用大语言模型开发用于甲状腺癌分期和风险水平分类的命名实体框架。

NPJ Digit Med. 2025 Mar 1;8(1):134. doi: 10.1038/s41746-025-01528-y.

Exploring the Potential of Claude 3 Opus in Renal Pathological Diagnosis: Performance Evaluation.探索 Claude 3 Opus 在肾脏病理诊断中的潜力：性能评估。

JMIR Med Inform. 2024 Nov 15;12:e65033. doi: 10.2196/65033.

本文引用的文献

A Systematic Review of Natural Language Processing Methods and Applications in Thyroidology.甲状腺学中自然语言处理方法与应用的系统评价

Mayo Clin Proc Digit Health. 2024 Jun;2(2):270-279. doi: 10.1016/j.mcpdig.2024.03.007. Epub 2024 May 21.

Thyroid Ultrasound Appropriateness Identification Through Natural Language Processing of Electronic Health Records.通过电子健康记录的自然语言处理进行甲状腺超声检查适宜性识别

Mayo Clin Proc Digit Health. 2024 Mar;2(1):67-74. doi: 10.1016/j.mcpdig.2024.01.001. Epub 2024 Feb 1.

Thyroid Cancer: A Review.甲状腺癌：综述。

JAMA. 2024 Feb 6;331(5):425-435. doi: 10.1001/jama.2023.26348.

Extracting Thyroid Nodules Characteristics from Ultrasound Reports Using Transformer-based Natural Language Processing Methods.基于 Transformer 的自然语言处理方法从超声报告中提取甲状腺结节特征。

AMIA Annu Symp Proc. 2024 Jan 11;2023:1193-1200. eCollection 2023.

Artificial Intelligence in Thyroidology: A Narrative Review of the Current Applications, Associated Challenges, and Future Directions.人工智能在甲状腺学中的应用：当前应用、相关挑战及未来方向的叙述性综述。

Thyroid. 2023 Aug;33(8):903-917. doi: 10.1089/thy.2023.0132. Epub 2023 Jun 26.

Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review.电子健康记录中与医疗决策相关的自然语言处理：一项系统综述。

Comput Biol Med. 2023 Mar;155:106649. doi: 10.1016/j.compbiomed.2023.106649. Epub 2023 Feb 10.

A large language model for electronic health records.用于电子健康记录的大型语言模型。

NPJ Digit Med. 2022 Dec 26;5(1):194. doi: 10.1038/s41746-022-00742-2.

The Incidence Trend of Papillary Thyroid Carcinoma in the United States During 2003-2017.美国 2003-2017 年期间甲状腺乳头状癌的发病趋势。

Cancer Control. 2022 Jan-Dec;29:10732748221135447. doi: 10.1177/10732748221135447.

The Prospective Implementation of the 2015 ATA Guidelines and Modified ATA Recurrence Risk Stratification System for Treatment of Differentiated Thyroid Cancer in a Canadian Tertiary Care Referral Setting.2015 版 ATA 指南在加拿大三级医疗转诊机构中实施情况及对分化型甲状腺癌治疗的改良 ATA 复发风险分层系统的前瞻性研究。

Thyroid. 2022 Dec;32(12):1509-1518. doi: 10.1089/thy.2022.0055. Epub 2022 Nov 29.

Epidemiology of Thyroid Cancer.甲状腺癌流行病学

Cancer Epidemiol Biomarkers Prev. 2022 Jul 1;31(7):1284-1297. doi: 10.1158/1055-9965.EPI-21-1440.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验