Tohyama Takeshi, Han Ahram, Yoon Dukyong, Paik Kenneth, Gow Brian, Izath Nura, Kpodonu Jacques, Celi Leo Anthony
medRxiv. 2025 Aug 19:2025.08.15.25333725. doi: 10.1101/2025.08.15.25333725.
Echocardiography serves as a cornerstone of cardiovascular diagnostics through multiple standardized imaging views. While recent AI foundation models demonstrate superior capabilities across cardiac imaging tasks, their massive computational requirements and reliance on large-scale datasets create accessibility barriers, limiting AI development to well-resourced institutions. Vector embedding approaches offer promising solutions by leveraging compact representations from original medical images for downstream applications. Furthermore, demographic fairness remains critical, as AI models may incorporate biases that confound clinically relevant features. We developed a multi-view encoder framework to address computational accessibility while investigating demographic fairness challenges.
We utilized the MIMIC-IV-ECHO dataset (7,169 echocardiographic studies) to develop a transformer-based multi-view encoder that aggregates view-level representations into study-level embeddings. The framework incorporated adversarial learning to suppress demographic information while maintaining clinical performance. We evaluated performance across 21 binary classification tasks encompassing echocardiographic measurements and clinical diagnoses, comparing against foundation model baselines with varying adversarial weights.
The multi-view encoder achieved a mean improvement of 9.0 AUC points (12.0% relative improvement) across clinical tasks compared to foundation model embeddings. Performance remained robust with limited echocardiographic views compared to the conventional approach. However, adversarial learning showed limited effectiveness in reducing demographic shortcuts, with stronger weighting substantially compromising diagnostic performance.
Our framework democratizes advanced cardiac AI capabilities, enabling substantial diagnostic improvements without massive computational infrastructure. While algorithmic approaches to demographic fairness showed limitations, the multi-view encoder provides a practical pathway for broader AI adoption in cardiovascular medicine with enhanced efficiency in real-world clinical settings.
Can multi-view encoder frameworks achieve superior diagnostic performance compared to foundation model embeddings while reducing computational requirements and maintaining robust performance with fewer echocardiographic views for cardiac AI applications? Multi-view encoder achieved 12.0% relative improvement (9.0 AUC points) across 21 cardiac tasks compared to foundation model baselines, with efficient 512-dimensional vector embeddings and robust performance using fewer echocardiographic views. Vector embedding approaches with attention-based multi-view integration significantly improve cardiac diagnostic performance while reducing computational requirements, offering a pathway toward more efficient AI implementation in clinical settings. Our proposed multi-view encoder framework overcomes critical barriers to the widespread adoption of artificial intelligence in echocardiography. By dramatically reducing computational requirements, the multi-view encoder approach allows smaller healthcare institutions to develop sophisticated AI models locally. The framework maintains robust performance with fewer echocardiographic examinations, which addresses real-world clinical constraints where comprehensive imaging is not feasible due to patient factors or time limitations. This technology provides a practical way to democratize advanced cardiac AI capabilities, which could improve access to cardiovascular care across diverse healthcare settings while reducing dependence on proprietary datasets and massive computational resources.
超声心动图通过多个标准化成像视图成为心血管诊断的基石。虽然最近的人工智能基础模型在心脏成像任务中展现出卓越能力,但其巨大的计算需求以及对大规模数据集的依赖造成了可及性障碍,将人工智能的发展限制在了资源充足的机构。向量嵌入方法通过利用原始医学图像的紧凑表示用于下游应用,提供了有前景的解决方案。此外,人口统计学公平性仍然至关重要,因为人工智能模型可能会纳入混淆临床相关特征的偏差。我们开发了一个多视图编码器框架来解决计算可及性问题,同时研究人口统计学公平性挑战。
我们利用MIMIC-IV-ECHO数据集(7169项超声心动图研究)开发了一个基于Transformer的多视图编码器,该编码器将视图级别的表示聚合为研究级别的嵌入。该框架纳入了对抗学习,以在保持临床性能的同时抑制人口统计学信息。我们评估了涵盖超声心动图测量和临床诊断的21项二元分类任务的性能,并与具有不同对抗权重的基础模型基线进行了比较。
与基础模型嵌入相比,多视图编码器在临床任务中平均提高了9.0个AUC点(相对提高12.0%)。与传统方法相比,在超声心动图视图有限的情况下,性能仍然稳健。然而,对抗学习在减少人口统计学捷径方面效果有限,更强的权重会大幅损害诊断性能。
我们的框架使先进的心脏人工智能能力得以普及,无需庞大的计算基础设施就能实现显著的诊断改进。虽然人口统计学公平性的算法方法存在局限性,但多视图编码器为心血管医学中更广泛地采用人工智能提供了一条实用途径,在现实世界的临床环境中提高了效率。
与基础模型嵌入相比,多视图编码器框架能否在降低计算需求的同时实现卓越的诊断性能,并且在心脏人工智能应用中使用更少的超声心动图视图保持稳健性能?与基础模型基线相比,多视图编码器在21项心脏任务中实现了12.0%的相对提高(9.0个AUC点),具有高效的512维向量嵌入,并且使用更少的超声心动图视图时性能稳健。基于注意力的多视图集成的向量嵌入方法在降低计算需求的同时显著提高了心脏诊断性能,为在临床环境中更高效地实施人工智能提供了一条途径。我们提出的多视图编码器框架克服了超声心动图中广泛采用人工智能的关键障碍。通过大幅降低计算需求,多视图编码器方法允许较小的医疗机构在本地开发复杂的人工智能模型。该框架在更少的超声心动图检查下保持稳健性能,这解决了由于患者因素或时间限制而无法进行全面成像的现实世界临床限制。这项技术提供了一种使先进的心脏人工智能能力得以普及的实用方法,这可以改善不同医疗环境中获得心血管护理的机会,同时减少对专有数据集和大量计算资源的依赖。