Yuan Victoria, Sahashi Yuki, Ieki Hirotaka, Vukadinovic Miloš, Binder Christina, Pieszko Konrad, Ambrosy Andrew P, Cheng Paul P, Cheng Susan, Ouyang David
Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA.
David Geffen School of Medicine, University of California, Los Angeles, CA.
medRxiv. 2025 Apr 30:2025.04.29.25326683. doi: 10.1101/2025.04.29.25326683.
Left ventricular diastolic dysfunction (LVDD) is most commonly evaluated by echocardiography. However, without a sole identifying metric, LVDD is assessed by a diagnostic algorithm relying on secondary characteristics that is laborious and has potential for interobserver variability.
To characterize concordance in clinical evaluations of LVDD, we evaluated historical echocardiogram studies at two academic medical centers for variability between clinician text reports and assessment by 2016 American Society of Echocardiography (ASE) guidelines. We then developed a workflow of 8 artificial intelligence (AI) models trained on over 155,000 studies to automate assessment of LVDD. Model performance was evaluated on temporally distinct held-out test sets from two academic medical centers.
In a validation cohort of 955 studies from Cedars-Sinai Medical Center, our AI workflow demonstrated 76.5% agreement and a weighted Cohen's kappa of 0.52 with ASE guideline assessment using human measurements. In contrast, the clinician report evaluation had 48.5% agreement and a weighted Cohen's kappa of 0.29 with ASE guidelines. In the Stanford Healthcare cohort of 1,572 studies, the AI workflow had 66.7% agreement and a weighted Cohen's kappa of 0.27 with ASE guidelines, while the clinician assessment had 32.7% agreement and a weighted Cohen's kappa of 0.06. Performance was consistent across patient subgroups stratified by sex, age, hypertension, diabetes, obesity, and coronary artery disease.
Clinicians are often inconsistent in evaluating LVDD. We developed an AI pipeline that automates the clinical workflow of grading LVDD, which can contribute to improved diagnosis of heart failure.
左心室舒张功能障碍(LVDD)最常通过超声心动图进行评估。然而,由于缺乏唯一的识别指标,LVDD是通过依赖次要特征的诊断算法进行评估的,这一过程繁琐且存在观察者间差异的可能性。
为了描述LVDD临床评估中的一致性,我们评估了两个学术医疗中心的历史超声心动图研究,以分析临床医生文本报告与2016年美国超声心动图学会(ASE)指南评估之间的差异。然后,我们开发了一个工作流程,包含8个人工智能(AI)模型,这些模型在超过155,000项研究上进行训练,以实现LVDD评估的自动化。在来自两个学术医疗中心的时间上不同的保留测试集上评估了模型性能。
在来自雪松西奈医疗中心的955项研究的验证队列中,我们的AI工作流程与使用人工测量的ASE指南评估显示出76.5%的一致性,加权Cohen's kappa为0.52。相比之下,临床医生报告评估与ASE指南的一致性为48.5%,加权Cohen's kappa为0.29。在斯坦福医疗队列的1,572项研究中,AI工作流程与ASE指南的一致性为66.7%,加权Cohen's kappa为0.27,而临床医生评估的一致性为32.7%,加权Cohen's kappa为0.06。在按性别、年龄、高血压、糖尿病、肥胖和冠状动脉疾病分层的患者亚组中,性能保持一致。
临床医生在评估LVDD时往往不一致。我们开发了一种AI流程,可自动执行LVDD分级的临床工作流程,这有助于改善心力衰竭的诊断。