用于成像和非成像生物医学数据综合分类的多模态数据融合方案

MULTI-MODAL DATA FUSION SCHEMES FOR INTEGRATED CLASSIFICATION OF IMAGING AND NON-IMAGING BIOMEDICAL DATA.

作者信息

Tiwari Pallavi, Viswanath Satish, Lee George, Madabhushi Anant

机构信息

Department of Biomedical Engineering, Rutgers University, NJ, USA.

出版信息

Proc IEEE Int Symp Biomed Imaging. 2011 Mar-Apr;2011:165-168. doi: 10.1109/ISBI.2011.5872379.

DOI:10.1109/ISBI.2011.5872379

PMID:25705325

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4335721/

Abstract

With a wide array of multi-modal, multi-protocol, and multi-scale biomedical data available for disease diagnosis and prognosis, there is a need for quantitative tools to combine such varied channels of information, especially imaging and non-imaging data (e.g. spectroscopy, proteomics). The major problem in such quantitative data integration lies in reconciling the large spread in the range of dimensionalities and scales across the different modalities. The primary goal of quantitative data integration is to build combined meta-classifiers; however these efforts are thwarted by challenges in (1) homogeneous representation of the data channels, (2) fusing the attributes to construct an integrated feature vector, and (3) the choice of learning strategy for training the integrated classifier. In this paper, we seek to (a) define the characteristics that guide the 4 independent methods for quantitative data fusion that use the idea of a meta-space for building integrated multi-modal, multi-scale meta-classifiers, and (b) attempt to understand the key components which allowed each method to succeed. These methods include (1) Generalized Embedding Concatenation (GEC), (2) Consensus Embedding (CE), (3) Semi-Supervised Multi-Kernel Graph Embedding (SeSMiK), and (4) Boosted Embedding Combination (BEC). In order to evaluate the optimal scheme for fusing imaging and non-imaging data, we compared these 4 schemes for the problems of combining (a) multi-parametric MRI with spectroscopy for prostate cancer (CaP) diagnosis , and (b) histological image with proteomic signatures (obtained via mass spectrometry) for predicting prognosis in CaP patients. The kernel combination approach (SeSMiK) marginally outperformed the embedding combination schemes. Additionally, intelligent weighting of the data channels (based on their relative importance) appeared to outperform unweighted strategies. All 4 strategies easily outperformed a naïve decision fusion approach, suggesting that data integration methods will play an important role in the rapidly emerging field of integrated diagnostics and personalized healthcare.

摘要

随着可用于疾病诊断和预后的多模态、多协议和多尺度生物医学数据种类繁多，需要定量工具来整合这些多样的信息渠道，尤其是成像和非成像数据（如光谱学、蛋白质组学）。这种定量数据整合的主要问题在于协调不同模态之间维度和尺度范围的巨大差异。定量数据整合的主要目标是构建组合元分类器；然而，这些努力受到以下挑战的阻碍：（1）数据通道的均匀表示，（2）融合属性以构建综合特征向量，以及（3）选择用于训练综合分类器的学习策略。在本文中，我们试图（a）定义指导使用元空间概念构建综合多模态、多尺度元分类器的4种独立定量数据融合方法的特征，以及（b）尝试理解使每种方法成功的关键组件。这些方法包括（1）广义嵌入连接（GEC）、（2）共识嵌入（CE）、（3）半监督多核图嵌入（SeSMiK）和（4）增强嵌入组合（BEC）。为了评估融合成像和非成像数据的最佳方案，我们针对以下问题比较了这4种方案：（a）将多参数MRI与光谱学结合用于前列腺癌（CaP）诊断，以及（b）将组织学图像与蛋白质组学特征（通过质谱获得）结合用于预测CaP患者的预后。核组合方法（SeSMiK）略优于嵌入组合方案。此外，基于数据通道相对重要性的智能加权似乎优于未加权策略。所有4种策略都轻松优于简单的决策融合方法，这表明数据整合方法将在快速兴起的综合诊断和个性化医疗领域发挥重要作用。