Suppr超能文献

利用人工智能增强甲状腺病理学:使用RUBY从电子健康报告中自动提取数据

Enhancing Thyroid Pathology With Artificial Intelligence: Automated Data Extraction From Electronic Health Reports Using RUBY.

作者信息

Culié Dorian, Schiappa Renaud, Contu Sara, Seutin Eva, Pace-Loscos Tanguy, Poissonnet Gilles, Villarme Agathe, Bozec Alexandre, Chamorey Emmanuel

机构信息

Cervico-Facial Oncology Surgical Department, University Institute of Face and Neck, Centre Antoine Lacassagne University of Côte d'Azur, Nice, France.

Department of Epidemiology, Biostatistics and Health Data, Centre Antoine Lacassagne, University of Côte d'Azur, Nice, France.

出版信息

JCO Clin Cancer Inform. 2024 Dec;8:e2300263. doi: 10.1200/CCI.23.00263. Epub 2024 Dec 10.

Abstract

PURPOSE

Thyroid nodules are common in the general population, and assessing their malignancy risk is the initial step in care. Surgical exploration remains the sole definitive option for indeterminate nodules. Extensive database access is crucial for improving this initial assessment. Our objective was to develop an automated process using convolutional neural networks (CNNs) to extract and structure biomedical insights from electronic health reports (EHRs) in a large thyroid pathology cohort.

MATERIALS AND METHODS

We randomly selected 1,500 patients with thyroid pathology from our cohort for model development and an additional 100 for testing. We then divided the cohort of 1,500 patients into training (70%) and validation (30%) sets. We used EHRs from initial surgeon visits, preanesthesia visits, ultrasound, surgery, and anatomopathology reports. We selected 42 variables of interest and had them manually annotated by a clinical expert. We developed RUBY-THYRO using six distinct CNN models from SpaCy, supplemented with keyword extraction rules and postprocessing. Evaluation against a gold standard database included calculating precision, recall, and F1 score.

RESULTS

Performance remained consistent across the test and validation sets, with the majority of variables (30/42) achieving performance metrics exceeding 90% for all metrics in both sets. Results differed according to the variables; pathologic tumor stage score achieved 100% in precision, recall, and F1 score, versus 45%, 28%, and 32% for the number of nodules in the test set, respectively. Surgical and preanesthesia reports demonstrated particularly high performance.

CONCLUSION

Our study successfully implemented a CNN-based natural language processing (NLP) approach for extracting and structuring data from various EHRs in thyroid pathology. This highlights the potential of artificial intelligence-driven NLP techniques for extensive and cost-effective data extraction, paving the way for creating comprehensive, hospital-wide data warehouses.

摘要

目的

甲状腺结节在普通人群中很常见,评估其恶性风险是治疗的第一步。手术探查仍然是不确定结节的唯一确定性选择。广泛的数据库访问对于改善这一初步评估至关重要。我们的目标是开发一种使用卷积神经网络(CNN)的自动化流程,以从大型甲状腺病理队列的电子健康报告(EHR)中提取和构建生物医学见解。

材料与方法

我们从队列中随机选择1500例甲状腺病理患者进行模型开发,并另外选择100例进行测试。然后,我们将1500例患者的队列分为训练集(70%)和验证集(30%)。我们使用了初次外科医生就诊、麻醉前就诊、超声、手术和解剖病理学报告中的电子健康记录。我们选择了42个感兴趣的变量,并由临床专家进行手动注释。我们使用来自SpaCy的六个不同的CNN模型开发了RUBY-THYRO,并辅以关键词提取规则和后处理。针对金标准数据库的评估包括计算精确率、召回率和F1分数。

结果

测试集和验证集的性能保持一致,大多数变量(30/42)在两组中的所有指标上的性能指标均超过90%。结果因变量而异;病理肿瘤分期评分的精确率、召回率和F1分数均达到100%,而测试集中结节数量的精确率、召回率和F1分数分别为45%、28%和32%。手术和麻醉前报告表现出特别高的性能。

结论

我们的研究成功实施了一种基于CNN的自然语言处理(NLP)方法,用于从甲状腺病理的各种电子健康记录中提取和构建数据。这突出了人工智能驱动的NLP技术在广泛且经济高效的数据提取方面的潜力,为创建全面的全院范围数据仓库铺平了道路。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验