Alpert Medical School of Brown University, Providence, RI, USA; Lifespan, Providence, RI, USA; Providence VA Medical Center, Providence, RI, USA; Boston University, Boston, MA, USA.
Human Health & Performance Systems Group, MIT Lincoln Laboratory, Lexington, MA, USA.
Ultrasound Med Biol. 2024 Jun;50(6):825-832. doi: 10.1016/j.ultrasmedbio.2024.02.004. Epub 2024 Feb 29.
B-lines assessed by lung ultrasound (LUS) outperform physical exam, chest radiograph, and biomarkers for the associated diagnosis of acute heart failure (AHF) in the emergent setting. The use of LUS is however limited to trained professionals and suffers from interpretation variability. The objective was to utilize transfer learning to create an AI-enabled software that can aid novice users to automate LUS B-line interpretation.
Data from an observational AHF LUS study provided standardized cine clips for AI model development and evaluation. A total of 49,952 LUS frames from 30 patients were hand scored and trained on a convolutional neural network (CNN) to interpret B-lines at the frame level. A random independent evaluation set of 476 LUS clips from 60 unique patients assessed model performance. The AI models scored the clips on both a binary and ordinal 0-4 multiclass assessment.
A multiclassification AI algorithm had the best performance at the binary level when applied to the independent evaluation set, AUC of 0.967 (95% CI 0.965-0.970) for detecting pathologic conditions. When compared to expert blinded reviewer, the 0-4 multiclassification AI algorithm scale had a reported linear weighted kappa of 0.839 (95% CI 0.804-0.871).
The multiclassification AI algorithm is a robust and well performing model at both binary and ordinal multiclass B-line evaluation. This algorithm has the potential to be integrated into clinical workflows to assist users with quantitative and objective B-line assessment for evaluation of AHF.
肺部超声(LUS)评估的 B 线优于体格检查、胸部 X 线和生物标志物,可用于紧急情况下急性心力衰竭(AHF)的相关诊断。然而,LUS 的应用仅限于训练有素的专业人员,并且存在解释的可变性。本研究旨在利用迁移学习创建一个人工智能软件,帮助新手用户实现 LUS B 线自动判读。
一项观察性 AHF LUS 研究提供了标准化的 cine 片段,用于 AI 模型的开发和评估。共对 30 名患者的 49952 个 LUS 帧进行了手动评分,并对卷积神经网络(CNN)进行了训练,以对帧级别的 B 线进行判读。对 60 名患者的 476 个 LUS 片段进行了随机独立评估集,以评估模型性能。AI 模型对 clips 进行了二进制和有序的 0-4 多类评估。
应用于独立评估集时,多分类 AI 算法在二进制水平上的性能最佳,其检测病理性情况的 AUC 为 0.967(95%CI 0.965-0.970)。与专家盲法评估者相比,0-4 多分类 AI 算法评分的线性加权 Kappa 值为 0.839(95%CI 0.804-0.871)。
多分类 AI 算法在二进制和有序多类 B 线评估中都是一种强大且性能良好的模型。该算法有可能被整合到临床工作流程中,帮助用户对 AHF 进行定量和客观的 B 线评估。