Suppr超能文献

用于整理经胸超声心动图(TTE)数据库的自然语言处理系统的开发与评估

Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database.

作者信息

Dong Tim, Sunderland Nicholas, Nightingale Angus, Fudulu Daniel P, Chan Jeremy, Zhai Ben, Freitas Alberto, Caputo Massimo, Dimagli Arnaldo, Mires Stuart, Wyatt Mike, Benedetto Umberto, Angelini Gianni D

机构信息

Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK.

School of Computing Science, Northumbria University, Newcastle upon Tyne NE1 8ST, UK.

出版信息

Bioengineering (Basel). 2023 Nov 10;10(11):1307. doi: 10.3390/bioengineering10111307.

Abstract

BACKGROUND

Although electronic health records (EHR) provide useful insights into disease patterns and patient treatment optimisation, their reliance on unstructured data presents a difficulty. Echocardiography reports, which provide extensive pathology information for cardiovascular patients, are particularly challenging to extract and analyse, because of their narrative structure. Although natural language processing (NLP) has been utilised successfully in a variety of medical fields, it is not commonly used in echocardiography analysis.

OBJECTIVES

To develop an NLP-based approach for extracting and categorising data from echocardiography reports by accurately converting continuous (e.g., LVOT VTI, AV VTI and TR Vmax) and discrete (e.g., regurgitation severity) outcomes in a semi-structured narrative format into a structured and categorised format, allowing for future research or clinical use.

METHODS

135,062 Trans-Thoracic Echocardiogram (TTE) reports were derived from 146967 baseline echocardiogram reports and split into three cohorts: Training and Validation (n = 1075), Test Dataset (n = 98) and Application Dataset (n = 133,889). The NLP system was developed and was iteratively refined using medical expert knowledge. The system was used to curate a moderate-fidelity database from extractions of 133,889 reports. A hold-out validation set of 98 reports was blindly annotated and extracted by two clinicians for comparison with the NLP extraction. Agreement, discrimination, accuracy and calibration of outcome measure extractions were evaluated.

RESULTS

Continuous outcomes including LVOT VTI, AV VTI and TR Vmax exhibited perfect inter-rater reliability using intra-class correlation scores (ICC = 1.00, < 0.05) alongside high R values, demonstrating an ideal alignment between the NLP system and clinicians. A good level (ICC = 0.75-0.9, < 0.05) of inter-rater reliability was observed for outcomes such as LVOT Diam, Lateral MAPSE, Peak E Velocity, Lateral E' Velocity, PV Vmax, Sinuses of Valsalva and Ascending Aorta diameters. Furthermore, the accuracy rate for discrete outcome measures was 91.38% in the confusion matrix analysis, indicating effective performance.

CONCLUSIONS

The NLP-based technique yielded good results when it came to extracting and categorising data from echocardiography reports. The system demonstrated a high degree of agreement and concordance with clinician extractions. This study contributes to the effective use of semi-structured data by providing a useful tool for converting semi-structured text to a structured echo report that can be used for data management. Additional validation and implementation in healthcare settings can improve data availability and support research and clinical decision-making.

摘要

背景

尽管电子健康记录(EHR)能为疾病模式和患者治疗优化提供有用见解,但其对非结构化数据的依赖带来了困难。超声心动图报告为心血管患者提供了广泛的病理信息,由于其叙述结构,提取和分析这些报告极具挑战性。尽管自然语言处理(NLP)已在多个医学领域成功应用,但在超声心动图分析中并不常用。

目的

开发一种基于NLP的方法,通过将半结构化叙述格式中的连续(如左心室流出道速度时间积分、主动脉瓣速度时间积分和三尖瓣反流最大速度)和离散(如反流严重程度)结果准确转换为结构化和分类格式,从超声心动图报告中提取数据并进行分类,以便未来研究或临床使用。

方法

从146967份基线超声心动图报告中获取135062份经胸超声心动图(TTE)报告,并分为三个队列:训练和验证队列(n = 1075)、测试数据集(n = 98)和应用数据集(n = 133889)。开发了NLP系统,并利用医学专家知识进行迭代完善。该系统用于从133889份报告的提取内容中整理出一个中等保真度的数据库。由两名临床医生对98份报告的保留验证集进行盲法注释和提取,以与NLP提取结果进行比较。评估了结果测量提取的一致性、区分度、准确性和校准情况。

结果

包括左心室流出道速度时间积分、主动脉瓣速度时间积分和三尖瓣反流最大速度在内的连续结果,使用组内相关系数得分显示出完美的评分者间可靠性(ICC = 1.00,P < 0.05),同时R值较高,表明NLP系统与临床医生之间具有理想的一致性。对于左心室流出道直径、侧壁心肌运动速度、E峰速度、侧壁E'速度、肺动脉瓣最大速度、主动脉瓣窦和升主动脉直径等结果,观察到了良好水平(ICC = 0.75 - 0.9,P < 0.05)的评分者间可靠性。此外,在混淆矩阵分析中,离散结果测量的准确率为91.38%,表明性能有效。

结论

基于NLP的技术在从超声心动图报告中提取和分类数据方面取得了良好结果。该系统与临床医生的提取结果显示出高度的一致性。本研究通过提供一种将半结构化文本转换为可用于数据管理的结构化超声报告的有用工具,为有效利用半结构化数据做出了贡献。在医疗环境中的进一步验证和实施可以提高数据可用性,并支持研究和临床决策。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8dc3/10669818/48d73643c963/bioengineering-10-01307-g001.jpg

相似文献

1
Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database.
Bioengineering (Basel). 2023 Nov 10;10(11):1307. doi: 10.3390/bioengineering10111307.
4
Designing an openEHR-Based Pipeline for Extracting and Standardizing Unstructured Clinical Data Using Natural Language Processing.
Methods Inf Med. 2020 Dec;59(S 02):e64-e78. doi: 10.1055/s-0040-1716403. Epub 2020 Oct 14.

本文引用的文献

1
A general text mining method to extract echocardiography measurement results from echocardiography documents.
Artif Intell Med. 2023 Sep;143:102584. doi: 10.1016/j.artmed.2023.102584. Epub 2023 May 20.
2
Accurate and Reliable Classification of Unstructured Reports on Their Diagnostic Goal Using BERT Models.
Diagnostics (Basel). 2023 Mar 27;13(7):1251. doi: 10.3390/diagnostics13071251.
3
Automated interpretation of stress echocardiography reports using natural language processing.
Eur Heart J Digit Health. 2022 Sep 5;3(4):626-637. doi: 10.1093/ehjdh/ztac047. eCollection 2022 Dec.
5
Natural Language Processing and Psychosis: On the Need for Comprehensive Psychometric Evaluation.
Schizophr Bull. 2022 Sep 1;48(5):939-948. doi: 10.1093/schbul/sbac051.
6
Development and multicenter validation of chest X-ray radiography interpretations based on natural language processing.
Commun Med (Lond). 2021 Oct 28;1:43. doi: 10.1038/s43856-021-00043-x. eCollection 2021.
7
Large-scale identification of aortic stenosis and its severity using natural language processing on electronic health records.
Cardiovasc Digit Health J. 2021 Mar 18;2(3):156-163. doi: 10.1016/j.cvdhj.2021.03.003. eCollection 2021 Jun.
8
Natural Language Processing markers in first episode psychosis and people at clinical high-risk.
Transl Psychiatry. 2021 Dec 13;11(1):630. doi: 10.1038/s41398-021-01722-y.
10
Can reproducibility be improved in clinical natural language processing? A study of 7 clinical NLP suites.
J Am Med Inform Assoc. 2021 Mar 1;28(3):504-515. doi: 10.1093/jamia/ocaa261.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验