Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX 79409, USA.
Department of Mathematics and Statistics, University of Nebraska-Lincoln, Lincoln, NE 68583, USA.
Bioinformatics. 2021 Jul 12;37(Suppl_1):i42-i50. doi: 10.1093/bioinformatics/btab336.
Anti-cancer drug sensitivity prediction using deep learning models for individual cell line is a significant challenge in personalized medicine. Recently developed REFINED (REpresentation of Features as Images with NEighborhood Dependencies) CNN (Convolutional Neural Network)-based models have shown promising results in improving drug sensitivity prediction. The primary idea behind REFINED-CNN is representing high dimensional vectors as compact images with spatial correlations that can benefit from CNN architectures. However, the mapping from a high dimensional vector to a compact 2D image depends on the a priori choice of the distance metric and projection scheme with limited empirical procedures guiding these choices.
In this article, we consider an ensemble of REFINED-CNN built under different choices of distance metrics and/or projection schemes that can improve upon a single projection based REFINED-CNN model. Results, illustrated using NCI60 and NCI-ALMANAC databases, demonstrate that the ensemble approaches can provide significant improvement in prediction performance as compared to individual models. We also develop the theoretical framework for combining different distance metrics to arrive at a single 2D mapping. Results demonstrated that distance-averaged REFINED-CNN produced comparable performance as obtained from stacking REFINED-CNN ensemble but with significantly lower computational cost.
The source code, scripts, and data used in the paper have been deposited in GitHub (https://github.com/omidbazgirTTU/IntegratedREFINED).
Supplementary data are available at Bioinformatics online.
使用深度学习模型对单个细胞系进行抗癌药物敏感性预测是个性化医疗中的一个重大挑战。最近开发的基于 REFINED(基于邻域依赖的特征表示为图像)CNN 的模型在提高药物敏感性预测方面取得了有希望的结果。REFINED-CNN 的主要思想是将高维向量表示为具有空间相关性的紧凑图像,这些图像可以受益于 CNN 架构。然而,从高维向量到紧凑的 2D 图像的映射取决于距离度量和投影方案的先验选择,而这些选择受到有限的经验程序的指导。
在本文中,我们考虑了在不同距离度量和/或投影方案下构建的 REF INED-CNN 集合,这些集合可以改进基于单个投影的 REF INED-CNN 模型。使用 NCI60 和 NCI-ALMANAC 数据库进行的结果表明,与单个模型相比,集合方法可以显著提高预测性能。我们还开发了组合不同距离度量以获得单个 2D 映射的理论框架。结果表明,距离平均化的 REFINED-CNN 产生的性能与堆叠 REFINED-CNN 集合获得的性能相当,但计算成本显著降低。
本文中使用的源代码、脚本和数据已存储在 GitHub(https://github.com/omidbazgirTTU/IntegratedREFINED)中。
补充数据可在 Bioinformatics 在线获得。