Long Aaron, Haggerty Christopher M, Finer Joshua, Hartzel Dustin, Jing Linyuan, Keivani Azadeh, Kelsey Christopher, Rocha Daniel, Ruhl Jeffrey, vanMaanen David, Metser Gil, Duffy Eamon, Mawson Thomas, Maurer Mathew, Einstein Andrew J, Beecy Ashley, Kumaraiah Deepa, Homma Shunichi, Liu Qi, Agarwal Vratika, Lebehn Mark, Leon Martin, Hahn Rebecca, Elias Pierre, Poterucha Timothy J
Seymour, Paul, and Gloria Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center/New York-Presbyterian Hospital, NY (A.L., G.M., E.D., T.M., M.M., A.J.E., D.K., S.H., Q.L., V.A., M. Lebehn, M. Leon, R.H., P.E., T.J.P.).
Departments of Biomedical Informatics (A.L., C.M.H., P.E.), Columbia University, New York, NY.
Circulation. 2024 Sep 17;150(12):911-922. doi: 10.1161/CIRCULATIONAHA.124.068996. Epub 2024 Jun 17.
Artificial intelligence, particularly deep learning (DL), has immense potential to improve the interpretation of transthoracic echocardiography (TTE). Mitral regurgitation (MR) is the most common valvular heart disease and presents unique challenges for DL, including the integration of multiple video-level assessments into a final study-level classification.
A novel DL system was developed to intake complete TTEs, identify color MR Doppler videos, and determine MR severity on a 4-step ordinal scale (none/trace, mild, moderate, and severe) using the reading cardiologist as a reference standard. This DL system was tested in internal and external test sets with performance assessed by agreement with the reading cardiologist, weighted κ, and area under the receiver-operating characteristic curve for binary classification of both moderate or greater and severe MR. In addition to the primary 4-step model, a 6-step MR assessment model was studied with the addition of the intermediate MR classes of mild-moderate and moderate-severe with performance assessed by both exact agreement and ±1 step agreement with the clinical MR interpretation.
A total of 61 689 TTEs were split into train (n=43 811), validation (n=8891), and internal test (n=8987) sets with an additional external test set of 8208 TTEs. The model had high performance in MR classification in internal (exact accuracy, 82%; κ=0.84; area under the receiver-operating characteristic curve, 0.98 for moderate or greater MR) and external test sets (exact accuracy, 79%; κ=0.80; area under the receiver-operating characteristic curve, 0.98 for moderate or greater MR). Most (63% internal and 66% external) misclassification disagreements were between none/trace and mild MR. MR classification accuracy was slightly higher using multiple TTE views (accuracy, 82%) than with only apical 4-chamber views (accuracy, 80%). In subset analyses, the model was accurate in the classification of both primary and secondary MR with slightly lower performance in cases of eccentric MR. In the analysis of the 6-step classification system, the exact accuracy was 80% and 76% with a ±1 step agreement of 99% and 98% in the internal and external test set, respectively.
This end-to-end DL system can intake entire echocardiogram studies to accurately classify MR severity and may be useful in helping clinicians refine MR assessments.
人工智能,尤其是深度学习(DL),在改善经胸超声心动图(TTE)解读方面具有巨大潜力。二尖瓣反流(MR)是最常见的瓣膜性心脏病,给深度学习带来了独特挑战,包括将多个视频层面的评估整合为最终的研究层面分类。
开发了一种新型深度学习系统,用于接收完整的TTE检查,识别彩色MR多普勒视频,并以阅读超声心动图的心脏病专家作为参考标准,按照4级顺序量表(无/微量、轻度、中度和重度)确定MR严重程度。该深度学习系统在内部和外部测试集中进行测试,通过与阅读超声心动图的心脏病专家的一致性、加权κ以及用于中度或更严重和重度MR二元分类的受试者操作特征曲线下面积来评估性能。除了主要的4级模型外,还研究了一种6级MR评估模型,增加了轻度 - 中度和中度 - 重度的中间MR类别,并通过与临床MR解读的完全一致和±1级一致来评估性能。
总共61689份TTE检查被分为训练集(n = 43811)、验证集(n = 8891)和内部测试集(n = 8987),另有一个包含8208份TTE检查的外部测试集。该模型在内部(准确准确率,82%;κ = 0.84;中度或更严重MR的受试者操作特征曲线下面积,0.98)和外部测试集(准确准确率,79%;κ = 0.80;中度或更严重MR的受试者操作特征曲线下面积,0.98)的MR分类中表现出高性能。大多数(内部63%,外部66%)错误分类不一致发生在无/微量和轻度MR之间。使用多个TTE视图进行MR分类的准确率(82%)略高于仅使用心尖四腔视图(80%)。在亚组分析中,该模型在原发性和继发性MR的分类中都很准确,在偏心MR病例中的性能略低。在6级分类系统分析中,内部和外部测试集的准确准确率分别为80%和76%,±1级一致率分别为99%和98%。
这种端到端的深度学习系统可以接收完整的超声心动图检查,以准确分类MR严重程度,可能有助于临床医生完善MR评估。