Suppr超能文献

利用自然语言处理从外科病理报告中提取和分类甲状腺乳头状癌特征。

Use of Natural Language Processing to Extract and Classify Papillary Thyroid Cancer Features From Surgical Pathology Reports.

机构信息

Knowledge and Evaluation Research Unit, Division of Endocrinology, Diabetes, Metabolism, and Nutrition, Department of Medicine, Mayo Clinic, Rochester, Minnesota.

Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota.

出版信息

Endocr Pract. 2024 Nov;30(11):1051-1058. doi: 10.1016/j.eprac.2024.08.008. Epub 2024 Aug 26.

Abstract

BACKGROUND

We aim to use Natural Language Processing to automate the extraction and classification of thyroid cancer risk factors from pathology reports.

METHODS

We analyzed 1410 surgical pathology reports from adult papillary thyroid cancer patients from 2010 to 2019. Structured and nonstructured reports were used to create a consensus-based ground truth dictionary and categorized them into modified recurrence risk levels. Nonstructured reports were narrative, while structured reports followed standardized formats. We developed ThyroPath, a rule-based Natural Language Processing pipeline, to extract and classify thyroid cancer features into risk categories. Training involved 225 reports (150 structured, 75 unstructured), with testing on 170 reports (120 structured, 50 unstructured) for evaluation. The pipeline's performance was assessed using both strict and lenient criteria for accuracy, precision, recall, and F1-score; a metric that combines precision and recall evaluation.

RESULTS

In extraction tasks, ThyroPath achieved overall strict F-1 scores of 93% for structured reports and 90% for unstructured reports, covering 18 thyroid cancer pathology features. In classification tasks, ThyroPath-extracted information demonstrated an overall accuracy of 93% in categorizing reports based on their corresponding guideline-based risk of recurrence: 76.9% for high-risk, 86.8% for intermediate risk, and 100% for both low and very low-risk cases. However, ThyroPath achieved 100% accuracy across all risk categories with human extracted pathology information.

CONCLUSIONS

ThyroPath shows promise in automating the extraction and risk recurrence classification of thyroid pathology reports at large scale. It offers a solution to laborious manual reviews and advancing virtual registries. However, it requires further validation before implementation.

摘要

背景

我们旨在使用自然语言处理技术从病理报告中自动提取和分类甲状腺癌风险因素。

方法

我们分析了 2010 年至 2019 年间 1410 例成人甲状腺乳头状癌患者的外科病理报告。使用结构化和非结构化报告创建了一个基于共识的基础字典,并将其分类为修改后的复发风险水平。非结构化报告为叙述性,而结构化报告则遵循标准化格式。我们开发了 ThyroPath,这是一个基于规则的自然语言处理管道,用于将甲状腺癌特征提取并分类到风险类别中。训练涉及 225 份报告(150 份结构化,75 份非结构化),170 份报告(120 份结构化,50 份非结构化)用于评估。使用严格和宽松的准确性、精度、召回率和 F1 评分标准评估管道的性能;这是一种结合精度和召回率评估的指标。

结果

在提取任务中,ThyroPath 在结构化报告中的总体严格 F1 评分为 93%,在非结构化报告中的总体严格 F1 评分为 90%,涵盖了 18 种甲状腺癌病理特征。在分类任务中,根据指南确定的复发风险,ThyroPath 提取的信息对报告进行分类的准确率为 93%:高风险为 76.9%,中风险为 86.8%,低风险和极低风险均为 100%。然而,使用人工提取的病理信息,ThyroPath 在所有风险类别中均达到了 100%的准确率。

结论

ThyroPath 在大规模自动提取和分类甲状腺病理报告中的风险复发方面具有很大的应用前景。它为繁琐的手动审查和推进虚拟登记提供了一种解决方案。然而,在实施之前,它需要进一步验证。

相似文献

本文引用的文献

1
A Systematic Review of Natural Language Processing Methods and Applications in Thyroidology.甲状腺学中自然语言处理方法与应用的系统评价
Mayo Clin Proc Digit Health. 2024 Jun;2(2):270-279. doi: 10.1016/j.mcpdig.2024.03.007. Epub 2024 May 21.
3
Thyroid Cancer: A Review.甲状腺癌:综述。
JAMA. 2024 Feb 6;331(5):425-435. doi: 10.1001/jama.2023.26348.
7
A large language model for electronic health records.用于电子健康记录的大型语言模型。
NPJ Digit Med. 2022 Dec 26;5(1):194. doi: 10.1038/s41746-022-00742-2.
10
Epidemiology of Thyroid Cancer.甲状腺癌流行病学
Cancer Epidemiol Biomarkers Prev. 2022 Jul 1;31(7):1284-1297. doi: 10.1158/1055-9965.EPI-21-1440.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验