Australian e-Health Research Centre, CSIRO, Queensland, Australia; School of Computing and Information Systems, The University of Melbourne, Victoria, Australia.
School of Computing and Information Systems, The University of Melbourne, Victoria, Australia; Centre for Digital Transformation of Health, The University of Melbourne, Victoria, Australia.
J Biomed Inform. 2023 Sep;145:104466. doi: 10.1016/j.jbi.2023.104466. Epub 2023 Aug 5.
With the increasing amount and growing variety of healthcare data, multimodal machine learning supporting integrated modeling of structured and unstructured data is an increasingly important tool for clinical machine learning tasks. However, it is non-trivial to manage the differences in dimensionality, volume, and temporal characteristics of data modalities in the context of a shared target task. Furthermore, patients can have substantial variations in the availability of data, while existing multimodal modeling methods typically assume data completeness and lack a mechanism to handle missing modalities.
We propose a Transformer-based fusion model with modality-specific tokens that summarize the corresponding modalities to achieve effective cross-modal interaction accommodating missing modalities in the clinical context. The model is further refined by inter-modal, inter-sample contrastive learning to improve the representations for better predictive performance. We denote the model as Attention-based cRoss-MOdal fUsion with contRast (ARMOUR). We evaluate ARMOUR using two input modalities (structured measurements and unstructured text), six clinical prediction tasks, and two evaluation regimes, either including or excluding samples with missing modalities.
Our model shows improved performances over unimodal or multimodal baselines in both evaluation regimes, including or excluding patients with missing modalities in the input. The contrastive learning improves the representation power and is shown to be essential for better results. The simple setup of modality-specific tokens enables ARMOUR to handle patients with missing modalities and allows comparison with existing unimodal benchmark results.
We propose a multimodal model for robust clinical prediction to achieve improved performance while accommodating patients with missing modalities. This work could inspire future research to study the effective incorporation of multiple, more complex modalities of clinical data into a single model.
随着医疗保健数据量的增加和种类的增多,支持对结构化和非结构化数据进行集成建模的多模态机器学习是临床机器学习任务中越来越重要的工具。然而,在共享目标任务的背景下,管理数据模态在维度、数量和时间特征方面的差异并非易事。此外,患者在数据可用性方面可能存在很大差异,而现有的多模态建模方法通常假设数据是完整的,并且缺乏处理缺失模态的机制。
我们提出了一种基于 Transformer 的融合模型,该模型具有特定于模态的令牌,可以总结相应的模态,以实现有效的跨模态交互,适应临床环境中缺失的模态。通过模态间、样本间对比学习进一步细化模型,以提高表示能力,从而提高预测性能。我们将该模型命名为基于注意力的跨模态融合与对比(ARMOUR)。我们使用两种输入模态(结构化测量和非结构化文本)、六个临床预测任务和两种评估方案(包括或不包括输入中缺失模态的样本)来评估 ARMOUR。
我们的模型在包括或不包括输入中缺失模态的样本的两种评估方案中,均优于单模态或多模态基线模型,表现出更好的性能。对比学习提高了表示能力,对于获得更好的结果是必不可少的。模态特定令牌的简单设置使 ARMOUR 能够处理缺失模态的患者,并允许与现有的单模态基准结果进行比较。
我们提出了一种用于稳健临床预测的多模态模型,以在适应缺失模态患者的同时提高性能。这项工作可以激发未来的研究,研究如何将多种更复杂的临床数据模式有效地纳入单个模型中。