Labaf Ashkan, Åhman-Persson Linda, Husu Leo Silvén, Smith J Gustav, Ingvarsson Annika, Evaldsson Anna Werther
Department of Clinical Sciences Lund, Cardiology, Section for Heart Failure and Valvular Disease, Lund University, Skåne University Hospital, Klinikgatan 15, Lund, 221 85, Sweden.
Department of Internal and Emergency Medicine, Skåne University Hospital, Malmö, Sweden.
Cardiovasc Ultrasound. 2025 Mar 3;23(1):3. doi: 10.1186/s12947-025-00338-2.
The incorporation of artificial intelligence (AI) into point-of-care ultrasound (POCUS) platforms has rapidly increased. The number of B-lines present on lung ultrasound (LUS) serve as a useful tool for the assessment of pulmonary congestion. Interpretation, however, requires experience and therefore AI automation has been pursued. This study aimed to test the agreement between the AI software embedded in a major vendor POCUS system and visual expert assessment.
This single-center prospective study included 55 patients hospitalized for various respiratory symptoms, predominantly acutely decompensated heart failure. A 12-zone protocol was used. Two experts in LUS independently categorized B-lines into 0, 1-2, 3-4, and ≥ 5. The intraclass correlation coefficient (ICC) was used to determine agreement.
A total of 672 LUS zones were obtained, with 584 (87%) eligible for analysis. Compared with expert reviewers, the AI significantly overcounted number of B-lines per patient (23.5 vs. 2.8, p < 0.001). A greater proportion of zones with > 5 B-lines was found by the AI than by the reviewers (38% vs. 4%, p < 0.001). The ICC between the AI and reviewers was 0.28 for the total sum of B-lines and 0.37 for the zone-by-zone method. The interreviewer agreement was excellent, with ICCs of 0.92 and 0.91, respectively.
This study demonstrated excellent interrater reliability of B-line counts from experts but poor agreement with the AI software embedded in a major vendor system, primarily due to overcounting. Our findings indicate that further development is needed to increase the accuracy of AI tools in LUS.
人工智能(AI)在床旁超声(POCUS)平台中的应用迅速增加。肺部超声(LUS)上的B线数量是评估肺充血的有用工具。然而,解读需要经验,因此人们一直在追求AI自动化。本研究旨在测试主要供应商POCUS系统中嵌入的AI软件与视觉专家评估之间的一致性。
这项单中心前瞻性研究纳入了55例因各种呼吸道症状住院的患者,主要是急性失代偿性心力衰竭患者。采用12区方案。两名LUS专家将B线独立分类为0、1 - 2、3 - 4和≥5。组内相关系数(ICC)用于确定一致性。
共获得672个LUS区域,其中584个(87%)符合分析条件。与专家评审员相比,AI显著高估了每位患者的B线数量(23.5对2.8,p < 0.001)。AI发现的B线>5条的区域比例高于评审员(38%对4%,p < 0.001)。AI与评审员之间B线总数的ICC为0.28,逐区方法的ICC为0.37。评审员之间的一致性非常好,ICC分别为0.92和0.91。
本研究表明专家对B线计数的评分者间可靠性极佳,但与主要供应商系统中嵌入的AI软件一致性较差,主要原因是计数过多。我们的研究结果表明,需要进一步开发以提高LUS中AI工具的准确性。