用于整理经胸超声心动图（TTE）数据库的自然语言处理系统的开发与评估

Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database.

作者信息

Dong Tim, Sunderland Nicholas, Nightingale Angus, Fudulu Daniel P, Chan Jeremy, Zhai Ben, Freitas Alberto, Caputo Massimo, Dimagli Arnaldo, Mires Stuart, Wyatt Mike, Benedetto Umberto, Angelini Gianni D

机构信息

Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK.

School of Computing Science, Northumbria University, Newcastle upon Tyne NE1 8ST, UK.

出版信息

Bioengineering (Basel). 2023 Nov 10;10(11):1307. doi: 10.3390/bioengineering10111307.

DOI:10.3390/bioengineering10111307

PMID:38002431

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10669818/

Abstract

BACKGROUND

Although electronic health records (EHR) provide useful insights into disease patterns and patient treatment optimisation, their reliance on unstructured data presents a difficulty. Echocardiography reports, which provide extensive pathology information for cardiovascular patients, are particularly challenging to extract and analyse, because of their narrative structure. Although natural language processing (NLP) has been utilised successfully in a variety of medical fields, it is not commonly used in echocardiography analysis.

OBJECTIVES

To develop an NLP-based approach for extracting and categorising data from echocardiography reports by accurately converting continuous (e.g., LVOT VTI, AV VTI and TR Vmax) and discrete (e.g., regurgitation severity) outcomes in a semi-structured narrative format into a structured and categorised format, allowing for future research or clinical use.

METHODS

135,062 Trans-Thoracic Echocardiogram (TTE) reports were derived from 146967 baseline echocardiogram reports and split into three cohorts: Training and Validation (n = 1075), Test Dataset (n = 98) and Application Dataset (n = 133,889). The NLP system was developed and was iteratively refined using medical expert knowledge. The system was used to curate a moderate-fidelity database from extractions of 133,889 reports. A hold-out validation set of 98 reports was blindly annotated and extracted by two clinicians for comparison with the NLP extraction. Agreement, discrimination, accuracy and calibration of outcome measure extractions were evaluated.

RESULTS

Continuous outcomes including LVOT VTI, AV VTI and TR Vmax exhibited perfect inter-rater reliability using intra-class correlation scores (ICC = 1.00, < 0.05) alongside high R values, demonstrating an ideal alignment between the NLP system and clinicians. A good level (ICC = 0.75-0.9, < 0.05) of inter-rater reliability was observed for outcomes such as LVOT Diam, Lateral MAPSE, Peak E Velocity, Lateral E' Velocity, PV Vmax, Sinuses of Valsalva and Ascending Aorta diameters. Furthermore, the accuracy rate for discrete outcome measures was 91.38% in the confusion matrix analysis, indicating effective performance.

CONCLUSIONS

The NLP-based technique yielded good results when it came to extracting and categorising data from echocardiography reports. The system demonstrated a high degree of agreement and concordance with clinician extractions. This study contributes to the effective use of semi-structured data by providing a useful tool for converting semi-structured text to a structured echo report that can be used for data management. Additional validation and implementation in healthcare settings can improve data availability and support research and clinical decision-making.

摘要

背景

尽管电子健康记录（EHR）能为疾病模式和患者治疗优化提供有用见解，但其对非结构化数据的依赖带来了困难。超声心动图报告为心血管患者提供了广泛的病理信息，由于其叙述结构，提取和分析这些报告极具挑战性。尽管自然语言处理（NLP）已在多个医学领域成功应用，但在超声心动图分析中并不常用。

目的

开发一种基于NLP的方法，通过将半结构化叙述格式中的连续（如左心室流出道速度时间积分、主动脉瓣速度时间积分和三尖瓣反流最大速度）和离散（如反流严重程度）结果准确转换为结构化和分类格式，从超声心动图报告中提取数据并进行分类，以便未来研究或临床使用。

方法

从146967份基线超声心动图报告中获取135062份经胸超声心动图（TTE）报告，并分为三个队列：训练和验证队列（n = 1075）、测试数据集（n = 98）和应用数据集（n = 133889）。开发了NLP系统，并利用医学专家知识进行迭代完善。该系统用于从133889份报告的提取内容中整理出一个中等保真度的数据库。由两名临床医生对98份报告的保留验证集进行盲法注释和提取，以与NLP提取结果进行比较。评估了结果测量提取的一致性、区分度、准确性和校准情况。

结果

包括左心室流出道速度时间积分、主动脉瓣速度时间积分和三尖瓣反流最大速度在内的连续结果，使用组内相关系数得分显示出完美的评分者间可靠性（ICC = 1.00，P < 0.05），同时R值较高，表明NLP系统与临床医生之间具有理想的一致性。对于左心室流出道直径、侧壁心肌运动速度、E峰速度、侧壁E'速度、肺动脉瓣最大速度、主动脉瓣窦和升主动脉直径等结果，观察到了良好水平（ICC = 0.75 - 0.9，P < 0.05）的评分者间可靠性。此外，在混淆矩阵分析中，离散结果测量的准确率为91.38%，表明性能有效。

结论

基于NLP的技术在从超声心动图报告中提取和分类数据方面取得了良好结果。该系统与临床医生的提取结果显示出高度的一致性。本研究通过提供一种将半结构化文本转换为可用于数据管理的结构化超声报告的有用工具，为有效利用半结构化数据做出了贡献。在医疗环境中的进一步验证和实施可以提高数据可用性，并支持研究和临床决策。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8dc3/10669818/48d73643c963/bioengineering-10-01307-g001.jpg

相似文献

Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database.

Bioengineering (Basel). 2023 Nov 10;10(11):1307. doi: 10.3390/bioengineering10111307.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Automated medical chart review for breast cancer outcomes research: a novel natural language processing extraction system.

BMC Med Res Methodol. 2022 May 12;22(1):136. doi: 10.1186/s12874-022-01583-z.

Designing an openEHR-Based Pipeline for Extracting and Standardizing Unstructured Clinical Data Using Natural Language Processing.

Methods Inf Med. 2020 Dec;59(S 02):e64-e78. doi: 10.1055/s-0040-1716403. Epub 2020 Oct 14.

Automated Transformation of Unstructured Cardiovascular Diagnostic Reports into Structured Datasets Using Sequentially Deployed Large Language Models.

medRxiv. 2024 Oct 8:2024.10.08.24315035. doi: 10.1101/2024.10.08.24315035.

Identifying the Severity of Heart Valve Stenosis and Regurgitation Among a Diverse Population Within an Integrated Health Care System: Natural Language Processing Approach.

JMIR Cardio. 2024 Sep 30;8:e60503. doi: 10.2196/60503.

Programming techniques for improving rule readability for rule-based information extraction natural language processing pipelines of unstructured and semi-structured medical texts.

Health Informatics J. 2023 Apr-Jun;29(2):14604582231164696. doi: 10.1177/14604582231164696.

Performance Improvement of a Natural Language Processing Tool for Extracting Patient Narratives Related to Medical States From Japanese Pharmaceutical Care Records by Increasing the Amount of Training Data: Natural Language Processing Analysis and Validation Study.

JMIR Med Inform. 2025 Mar 4;13:e68863. doi: 10.2196/68863.

The Food and Drug Administration Biologics Effectiveness and Safety Initiative Facilitates Detection of Vaccine Administrations From Unstructured Data in Medical Records Through Natural Language Processing.

Front Digit Health. 2021 Dec 22;3:777905. doi: 10.3389/fdgth.2021.777905. eCollection 2021.

Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data.

JMIR Form Res. 2023 Mar 7;7:e43014. doi: 10.2196/43014.

引用本文的文献

Determinants of artificial intelligence electrocardiogram-derived age and its association with cardiovascular events and mortality: a systematic review and meta-analysis.

NPJ Digit Med. 2025 May 29;8(1):322. doi: 10.1038/s41746-025-01727-7.

Evaluating large language models in echocardiography reporting: opportunities and challenges.

Eur Heart J Digit Health. 2025 Mar 31;6(3):326-339. doi: 10.1093/ehjdh/ztae086. eCollection 2025 May.

Enhancing Left Ventricular Segmentation in Echocardiograms Through GAN-Based Synthetic Data Augmentation and MultiResUNet Architecture.

Diagnostics (Basel). 2025 Mar 9;15(6):663. doi: 10.3390/diagnostics15060663.

Diagnosis extraction from unstructured Dutch echocardiogram reports using span- and document-level characteristic classification.

BMC Med Inform Decis Mak. 2025 Mar 7;25(1):115. doi: 10.1186/s12911-025-02897-w.

Ontology-guided machine learning outperforms zero-shot foundation models for cardiac ultrasound text reports.

Sci Rep. 2025 Feb 14;15(1):5456. doi: 10.1038/s41598-024-83540-y.

Triglyceride index as a predictor of mortality after cardiac surgery.

iScience. 2024 Oct 5;27(11):111107. doi: 10.1016/j.isci.2024.111107. eCollection 2024 Nov 15.

Identifying the Severity of Heart Valve Stenosis and Regurgitation Among a Diverse Population Within an Integrated Health Care System: Natural Language Processing Approach.

JMIR Cardio. 2024 Sep 30;8:e60503. doi: 10.2196/60503.

本文引用的文献

A general text mining method to extract echocardiography measurement results from echocardiography documents.

Artif Intell Med. 2023 Sep;143:102584. doi: 10.1016/j.artmed.2023.102584. Epub 2023 May 20.

Accurate and Reliable Classification of Unstructured Reports on Their Diagnostic Goal Using BERT Models.

Diagnostics (Basel). 2023 Mar 27;13(7):1251. doi: 10.3390/diagnostics13071251.

Automated interpretation of stress echocardiography reports using natural language processing.

Eur Heart J Digit Health. 2022 Sep 5;3(4):626-637. doi: 10.1093/ehjdh/ztac047. eCollection 2022 Dec.

Validation of a Natural Language Processing Algorithm for the Extraction of the Sleep Parameters from the Polysomnography Reports.

Healthcare (Basel). 2022 Sep 22;10(10):1837. doi: 10.3390/healthcare10101837.

Natural Language Processing and Psychosis: On the Need for Comprehensive Psychometric Evaluation.

Schizophr Bull. 2022 Sep 1;48(5):939-948. doi: 10.1093/schbul/sbac051.

Development and multicenter validation of chest X-ray radiography interpretations based on natural language processing.

Commun Med (Lond). 2021 Oct 28;1:43. doi: 10.1038/s43856-021-00043-x. eCollection 2021.

Large-scale identification of aortic stenosis and its severity using natural language processing on electronic health records.

Cardiovasc Digit Health J. 2021 Mar 18;2(3):156-163. doi: 10.1016/j.cvdhj.2021.03.003. eCollection 2021 Jun.

Natural Language Processing markers in first episode psychosis and people at clinical high-risk.

Transl Psychiatry. 2021 Dec 13;11(1):630. doi: 10.1038/s41398-021-01722-y.

Correlating natural language processing and automated speech analysis with clinician assessment to quantify speech-language changes in mild cognitive impairment and Alzheimer's dementia.

Alzheimers Res Ther. 2021 Jun 4;13(1):109. doi: 10.1186/s13195-021-00848-x.

Can reproducibility be improved in clinical natural language processing? A study of 7 clinical NLP suites.

J Am Med Inform Assoc. 2021 Mar 1;28(3):504-515. doi: 10.1093/jamia/ocaa261.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于整理经胸超声心动图（TTE）数据库的自然语言处理系统的开发与评估

Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVES

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献