Mohaiminul Islam Md, Huang Shujun, Ajwad Rasif, Chi Chen, Wang Yang, Hu Pingzhao
Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Manitoba R3E 0W3, Canada.
Department of Computer Science, University of Manitoba, Winnipeg, Manitoba R3E 0W3, Canada.
Comput Struct Biotechnol J. 2020 Aug 11;18:2185-2199. doi: 10.1016/j.csbj.2020.08.005. eCollection 2020.
Classification of breast cancer subtypes using multi-omics profiles is a difficult problem since the data sets are high-dimensional and highly correlated. Deep neural network (DNN) learning has demonstrated advantages over traditional methods as it does not require any hand-crafted features, but rather automatically extract features from raw data and efficiently analyze high-dimensional and correlated data. We aim to develop an integrative deep learning framework for classifying molecular subtypes of breast cancer. We collect copy number alteration and gene expression data measured on the same breast cancer patients from the Molecular Taxonomy of Breast Cancer International Consortium. We propose a deep learning model to integrate the omics datasets for predicting their molecular subtypes. The performance of our proposed DNN model is compared with some baseline models. Furthermore, we evaluate the misclassification of the subtypes using the learned deep features and explore their usefulness for clustering the breast cancer patients. We demonstrate that our proposed integrative deep learning model is superior to other deep learning and non-deep learning based models. Particularly, we get the best prediction result among the deep learning-based integration models when we integrate the two data sources using the concatenation layer in the models without sharing the weights. Using the learned deep features, we identify 6 breast cancer subgroups and show that Her2-enriched samples can be classified into more than one tumor subtype. Overall, the integrated model show better performance than those trained on individual data sources.
利用多组学图谱对乳腺癌亚型进行分类是一个难题,因为数据集具有高维度且高度相关。深度神经网络(DNN)学习已证明优于传统方法,因为它不需要任何手工制作的特征,而是能从原始数据中自动提取特征,并有效分析高维和相关数据。我们旨在开发一个用于对乳腺癌分子亚型进行分类的集成深度学习框架。我们从国际乳腺癌分子分类联盟收集了在同一乳腺癌患者身上测量的拷贝数改变和基因表达数据。我们提出了一个深度学习模型来整合组学数据集以预测其分子亚型。将我们提出的DNN模型的性能与一些基线模型进行比较。此外,我们使用学习到的深度特征评估亚型的错误分类,并探索它们在对乳腺癌患者进行聚类方面的有用性。我们证明,我们提出的集成深度学习模型优于其他基于深度学习和非深度学习的模型。特别是,当我们在不共享权重的模型中使用拼接层整合两个数据源时,我们在基于深度学习的集成模型中获得了最佳预测结果。利用学习到的深度特征,我们识别出6个乳腺癌亚组,并表明富含Her2的样本可以被分类到不止一种肿瘤亚型中。总体而言,集成模型比在单个数据源上训练的模型表现更好。