Moutselos Konstantinos, Maglogiannis Ilias, Chatziioannou Aristotelis
Department of Computer Science and Biomedical Informatics, University of Thessaly, Papasiopoulou 2-4, 35100 Lamia, Greece.
Department of Digital Systems, University of Piraeus, Grigoriou Lampraki 126, 18532 Piraeus, Greece.
Biomed Res Int. 2014;2014:145243. doi: 10.1155/2014/145243. Epub 2014 Jan 16.
In this work the effects of simple imputations are studied, regarding the integration of multimodal data originating from different patients. Two separate datasets of cutaneous melanoma are used, an image analysis (dermoscopy) dataset together with a transcriptomic one, specifically DNA microarrays. Each modality is related to a different set of patients, and four imputation methods are employed to the formation of a unified, integrative dataset. The application of backward selection together with ensemble classifiers (random forests), followed by principal components analysis and linear discriminant analysis, illustrates the implication of the imputations on feature selection and dimensionality reduction methods. The results suggest that the expansion of the feature space through the data integration, achieved by the exploitation of imputation schemes in general, aids the classification task, imparting stability as regards the derivation of putative classifiers. In particular, although the biased imputation methods increase significantly the predictive performance and the class discrimination of the datasets, they still contribute to the study of prominent features and their relations. The fusion of separate datasets, which provide a multimodal description of the same pathology, represents an innovative, promising avenue, enhancing robust composite biomarker derivation and promoting the interpretation of the biomedical problem studied.
在这项工作中,研究了简单插补的效果,涉及源自不同患者的多模态数据的整合。使用了两个独立的皮肤黑色素瘤数据集,一个图像分析(皮肤镜检查)数据集和一个转录组数据集,具体为DNA微阵列。每种模态都与不同的患者组相关,并且采用了四种插补方法来形成一个统一的综合数据集。将反向选择与集成分类器(随机森林)相结合,随后进行主成分分析和线性判别分析,说明了插补在特征选择和降维方法中的作用。结果表明,通过数据整合扩展特征空间,一般来说通过利用插补方案来实现,有助于分类任务,在推导假定分类器方面赋予稳定性。特别是,尽管有偏插补方法显著提高了数据集的预测性能和类别的区分度,但它们仍然有助于突出特征及其关系的研究。融合提供同一病理多模态描述的单独数据集,代表了一条创新、有前景的途径,增强了稳健复合生物标志物的推导,并促进了对所研究生物医学问题的解释。