Dong Tim, Sunderland Nicholas, Nightingale Angus, Fudulu Daniel P, Chan Jeremy, Zhai Ben, Freitas Alberto, Caputo Massimo, Dimagli Arnaldo, Mires Stuart, Wyatt Mike, Benedetto Umberto, Angelini Gianni D
Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK.
School of Computing Science, Northumbria University, Newcastle upon Tyne NE1 8ST, UK.
Bioengineering (Basel). 2023 Nov 10;10(11):1307. doi: 10.3390/bioengineering10111307.
Although electronic health records (EHR) provide useful insights into disease patterns and patient treatment optimisation, their reliance on unstructured data presents a difficulty. Echocardiography reports, which provide extensive pathology information for cardiovascular patients, are particularly challenging to extract and analyse, because of their narrative structure. Although natural language processing (NLP) has been utilised successfully in a variety of medical fields, it is not commonly used in echocardiography analysis.
To develop an NLP-based approach for extracting and categorising data from echocardiography reports by accurately converting continuous (e.g., LVOT VTI, AV VTI and TR Vmax) and discrete (e.g., regurgitation severity) outcomes in a semi-structured narrative format into a structured and categorised format, allowing for future research or clinical use.
135,062 Trans-Thoracic Echocardiogram (TTE) reports were derived from 146967 baseline echocardiogram reports and split into three cohorts: Training and Validation (n = 1075), Test Dataset (n = 98) and Application Dataset (n = 133,889). The NLP system was developed and was iteratively refined using medical expert knowledge. The system was used to curate a moderate-fidelity database from extractions of 133,889 reports. A hold-out validation set of 98 reports was blindly annotated and extracted by two clinicians for comparison with the NLP extraction. Agreement, discrimination, accuracy and calibration of outcome measure extractions were evaluated.
Continuous outcomes including LVOT VTI, AV VTI and TR Vmax exhibited perfect inter-rater reliability using intra-class correlation scores (ICC = 1.00, < 0.05) alongside high R values, demonstrating an ideal alignment between the NLP system and clinicians. A good level (ICC = 0.75-0.9, < 0.05) of inter-rater reliability was observed for outcomes such as LVOT Diam, Lateral MAPSE, Peak E Velocity, Lateral E' Velocity, PV Vmax, Sinuses of Valsalva and Ascending Aorta diameters. Furthermore, the accuracy rate for discrete outcome measures was 91.38% in the confusion matrix analysis, indicating effective performance.
The NLP-based technique yielded good results when it came to extracting and categorising data from echocardiography reports. The system demonstrated a high degree of agreement and concordance with clinician extractions. This study contributes to the effective use of semi-structured data by providing a useful tool for converting semi-structured text to a structured echo report that can be used for data management. Additional validation and implementation in healthcare settings can improve data availability and support research and clinical decision-making.
尽管电子健康记录(EHR)能为疾病模式和患者治疗优化提供有用见解,但其对非结构化数据的依赖带来了困难。超声心动图报告为心血管患者提供了广泛的病理信息,由于其叙述结构,提取和分析这些报告极具挑战性。尽管自然语言处理(NLP)已在多个医学领域成功应用,但在超声心动图分析中并不常用。
开发一种基于NLP的方法,通过将半结构化叙述格式中的连续(如左心室流出道速度时间积分、主动脉瓣速度时间积分和三尖瓣反流最大速度)和离散(如反流严重程度)结果准确转换为结构化和分类格式,从超声心动图报告中提取数据并进行分类,以便未来研究或临床使用。
从146967份基线超声心动图报告中获取135062份经胸超声心动图(TTE)报告,并分为三个队列:训练和验证队列(n = 1075)、测试数据集(n = 98)和应用数据集(n = 133889)。开发了NLP系统,并利用医学专家知识进行迭代完善。该系统用于从133889份报告的提取内容中整理出一个中等保真度的数据库。由两名临床医生对98份报告的保留验证集进行盲法注释和提取,以与NLP提取结果进行比较。评估了结果测量提取的一致性、区分度、准确性和校准情况。
包括左心室流出道速度时间积分、主动脉瓣速度时间积分和三尖瓣反流最大速度在内的连续结果,使用组内相关系数得分显示出完美的评分者间可靠性(ICC = 1.00,P < 0.05),同时R值较高,表明NLP系统与临床医生之间具有理想的一致性。对于左心室流出道直径、侧壁心肌运动速度、E峰速度、侧壁E'速度、肺动脉瓣最大速度、主动脉瓣窦和升主动脉直径等结果,观察到了良好水平(ICC = 0.75 - 0.9,P < 0.05)的评分者间可靠性。此外,在混淆矩阵分析中,离散结果测量的准确率为91.38%,表明性能有效。
基于NLP的技术在从超声心动图报告中提取和分类数据方面取得了良好结果。该系统与临床医生的提取结果显示出高度的一致性。本研究通过提供一种将半结构化文本转换为可用于数据管理的结构化超声报告的有用工具,为有效利用半结构化数据做出了贡献。在医疗环境中的进一步验证和实施可以提高数据可用性,并支持研究和临床决策。