利用人工智能增强甲状腺病理学：使用RUBY从电子健康报告中自动提取数据

Enhancing Thyroid Pathology With Artificial Intelligence: Automated Data Extraction From Electronic Health Reports Using RUBY.

作者信息

Culié Dorian, Schiappa Renaud, Contu Sara, Seutin Eva, Pace-Loscos Tanguy, Poissonnet Gilles, Villarme Agathe, Bozec Alexandre, Chamorey Emmanuel

机构信息

Cervico-Facial Oncology Surgical Department, University Institute of Face and Neck, Centre Antoine Lacassagne University of Côte d'Azur, Nice, France.

Department of Epidemiology, Biostatistics and Health Data, Centre Antoine Lacassagne, University of Côte d'Azur, Nice, France.

出版信息

JCO Clin Cancer Inform. 2024 Dec;8:e2300263. doi: 10.1200/CCI.23.00263. Epub 2024 Dec 10.

DOI:10.1200/CCI.23.00263

PMID:39657101

Abstract

PURPOSE

Thyroid nodules are common in the general population, and assessing their malignancy risk is the initial step in care. Surgical exploration remains the sole definitive option for indeterminate nodules. Extensive database access is crucial for improving this initial assessment. Our objective was to develop an automated process using convolutional neural networks (CNNs) to extract and structure biomedical insights from electronic health reports (EHRs) in a large thyroid pathology cohort.

MATERIALS AND METHODS

We randomly selected 1,500 patients with thyroid pathology from our cohort for model development and an additional 100 for testing. We then divided the cohort of 1,500 patients into training (70%) and validation (30%) sets. We used EHRs from initial surgeon visits, preanesthesia visits, ultrasound, surgery, and anatomopathology reports. We selected 42 variables of interest and had them manually annotated by a clinical expert. We developed RUBY-THYRO using six distinct CNN models from SpaCy, supplemented with keyword extraction rules and postprocessing. Evaluation against a gold standard database included calculating precision, recall, and F1 score.

RESULTS

Performance remained consistent across the test and validation sets, with the majority of variables (30/42) achieving performance metrics exceeding 90% for all metrics in both sets. Results differed according to the variables; pathologic tumor stage score achieved 100% in precision, recall, and F1 score, versus 45%, 28%, and 32% for the number of nodules in the test set, respectively. Surgical and preanesthesia reports demonstrated particularly high performance.

CONCLUSION

Our study successfully implemented a CNN-based natural language processing (NLP) approach for extracting and structuring data from various EHRs in thyroid pathology. This highlights the potential of artificial intelligence-driven NLP techniques for extensive and cost-effective data extraction, paving the way for creating comprehensive, hospital-wide data warehouses.

摘要

目的

甲状腺结节在普通人群中很常见，评估其恶性风险是治疗的第一步。手术探查仍然是不确定结节的唯一确定性选择。广泛的数据库访问对于改善这一初步评估至关重要。我们的目标是开发一种使用卷积神经网络（CNN）的自动化流程，以从大型甲状腺病理队列的电子健康报告（EHR）中提取和构建生物医学见解。

材料与方法

我们从队列中随机选择1500例甲状腺病理患者进行模型开发，并另外选择100例进行测试。然后，我们将1500例患者的队列分为训练集（70%）和验证集（30%）。我们使用了初次外科医生就诊、麻醉前就诊、超声、手术和解剖病理学报告中的电子健康记录。我们选择了42个感兴趣的变量，并由临床专家进行手动注释。我们使用来自SpaCy的六个不同的CNN模型开发了RUBY-THYRO，并辅以关键词提取规则和后处理。针对金标准数据库的评估包括计算精确率、召回率和F1分数。

结果

测试集和验证集的性能保持一致，大多数变量（30/42）在两组中的所有指标上的性能指标均超过90%。结果因变量而异；病理肿瘤分期评分的精确率、召回率和F1分数均达到100%，而测试集中结节数量的精确率、召回率和F1分数分别为45%、28%和32%。手术和麻醉前报告表现出特别高的性能。

结论

我们的研究成功实施了一种基于CNN的自然语言处理（NLP）方法，用于从甲状腺病理的各种电子健康记录中提取和构建数据。这突出了人工智能驱动的NLP技术在广泛且经济高效的数据提取方面的潜力，为创建全面的全院范围数据仓库铺平了道路。

相似文献

Enhancing Thyroid Pathology With Artificial Intelligence: Automated Data Extraction From Electronic Health Reports Using RUBY.利用人工智能增强甲状腺病理学：使用RUBY从电子健康报告中自动提取数据

JCO Clin Cancer Inform. 2024 Dec;8:e2300263. doi: 10.1200/CCI.23.00263. Epub 2024 Dec 10.

Development of a Natural Language Processing Model for Extracting Kidney Biopsy Pathology Diagnoses.用于提取肾活检病理诊断的自然语言处理模型的开发

Kidney Med. 2025 Jun 14;7(8):101047. doi: 10.1016/j.xkme.2025.101047. eCollection 2025 Aug.

Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer.利用晚期癌症患者腹部和骨盆 CT 图像建立卷积神经网络模型预测股骨近端病理性骨折的研究

Clin Orthop Relat Res. 2023 Nov 1;481(11):2247-2256. doi: 10.1097/CORR.0000000000002771. Epub 2023 Aug 23.

Extracting lung cancer staging descriptors from pathology reports: A generative language model approach.从病理报告中提取肺癌分期描述符：一种生成式语言模型方法。

J Biomed Inform. 2024 Sep;157:104720. doi: 10.1016/j.jbi.2024.104720. Epub 2024 Sep 2.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

A multicenter diagnostic study of thyroid nodule with Hashimoto's thyroiditis enabled by Hashimoto's thyroiditis nodule-artificial intelligence model.基于桥本甲状腺炎结节人工智能模型的甲状腺结节合并桥本甲状腺炎多中心诊断研究

Eur Radiol. 2025 Feb 13. doi: 10.1007/s00330-025-11422-6.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Automated Extraction of Mortality Information From Publicly Available Sources Using Large Language Models: Development and Evaluation Study.使用大语言模型从公开可用来源自动提取死亡率信息：开发与评估研究

J Med Internet Res. 2025 Aug 18;27:e71113. doi: 10.2196/71113.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.一种新的量化社会健康指标与寻求肌肉骨骼专科护理的患者的不适程度、能力以及心理和总体健康水平相关。

Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用人工智能增强甲状腺病理学：使用RUBY从电子健康报告中自动提取数据

Enhancing Thyroid Pathology With Artificial Intelligence: Automated Data Extraction From Electronic Health Reports Using RUBY.

作者信息

机构信息

出版信息

PURPOSE

MATERIALS AND METHODS

RESULTS

CONCLUSION

目的

材料与方法

结果

结论

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献