Viswanath Satish E, Tiwari Pallavi, Lee George, Madabhushi Anant
Department of Biomedical Engineering, Case Western Reserve University, 10900 Euclid Ave, Wickenden 523, Cleveland, OH, USA.
BMC Med Imaging. 2017 Jan 5;17(1):2. doi: 10.1186/s12880-016-0172-6.
With a wide array of multi-modal, multi-protocol, and multi-scale biomedical data being routinely acquired for disease characterization, there is a pressing need for quantitative tools to combine these varied channels of information. The goal of these integrated predictors is to combine these varied sources of information, while improving on the predictive ability of any individual modality. A number of application-specific data fusion methods have been previously proposed in the literature which have attempted to reconcile the differences in dimensionalities and length scales across different modalities. Our objective in this paper was to help identify metholodological choices that need to be made in order to build a data fusion technique, as it is not always clear which strategy is optimal for a particular problem. As a comprehensive review of all possible data fusion methods was outside the scope of this paper, we have focused on fusion approaches that employ dimensionality reduction (DR).
In this work, we quantitatively evaluate 4 non-overlapping existing instantiations of DR-based data fusion, within 3 different biomedical applications comprising over 100 studies. These instantiations utilized different knowledge representation and knowledge fusion methods, allowing us to examine the interplay of these modules in the context of data fusion. The use cases considered in this work involve the integration of (a) radiomics features from T2w MRI with peak area features from MR spectroscopy for identification of prostate cancer in vivo, (b) histomorphometric features (quantitative features extracted from histopathology) with protein mass spectrometry features for predicting 5 year biochemical recurrence in prostate cancer patients, and (c) volumetric measurements on T1w MRI with protein expression features to discriminate between patients with and without Alzheimers' Disease.
Our preliminary results in these specific use cases indicated that the use of kernel representations in conjunction with DR-based fusion may be most effective, as a weighted multi-kernel-based DR approach resulted in the highest area under the ROC curve of over 0.8. By contrast non-optimized DR-based representation and fusion methods yielded the worst predictive performance across all 3 applications. Our results suggest that when the individual modalities demonstrate relatively poor discriminability, many of the data fusion methods may not yield accurate, discriminatory representations either. In summary, to outperform the predictive ability of individual modalities, methodological choices for data fusion must explicitly account for the sparsity of and noise in the feature space.
随着用于疾病特征描述的多模态、多协议和多尺度生物医学数据不断被常规获取,迫切需要定量工具来整合这些多样的信息渠道。这些集成预测器的目标是整合这些多样的信息源,同时提高任何单一模态的预测能力。文献中先前已经提出了许多特定应用的数据融合方法,这些方法试图协调不同模态之间维度和长度尺度的差异。我们在本文中的目标是帮助确定构建数据融合技术时需要做出的方法选择,因为对于特定问题哪种策略最优并不总是清晰的。由于全面回顾所有可能的数据融合方法超出了本文的范围,我们专注于采用降维(DR)的融合方法。
在这项工作中,我们在包含超过100项研究的3种不同生物医学应用中,对基于DR的数据融合的4种不重叠的现有实例进行了定量评估。这些实例利用了不同的知识表示和知识融合方法,使我们能够在数据融合的背景下研究这些模块之间的相互作用。这项工作中考虑的用例包括:(a)将T2加权磁共振成像(MRI)的影像组学特征与磁共振波谱的峰面积特征相结合,用于体内前列腺癌的识别;(b)组织形态计量学特征(从组织病理学中提取的定量特征)与蛋白质质谱特征相结合,用于预测前列腺癌患者的5年生化复发;(c)T1加权MRI的体积测量与蛋白质表达特征相结合,以区分患有和未患有阿尔茨海默病的患者。
我们在这些特定用例中的初步结果表明,将核表示与基于DR的融合结合使用可能是最有效的,因为基于加权多核的DR方法在ROC曲线下面积方面表现最佳,超过了0.8。相比之下,未优化的基于DR的表示和融合方法在所有3种应用中产生了最差的预测性能。我们的结果表明,当各个模态的可区分性相对较差时,许多数据融合方法可能也无法产生准确的、有区分性的表示。总之,为了超越单个模态的预测能力,数据融合的方法选择必须明确考虑特征空间的稀疏性和噪声。