Department of Electrical Engineering, National Tsing Hua University, Hsinchu, 30013, Taiwan.
Department of Electrical Engineering and Graduate Institute of Communication Engineering, National Taiwan University, Taipei, 10617, Taiwan.
Sci Rep. 2021 Jul 21;11(1):14914. doi: 10.1038/s41598-021-92864-y.
Breast cancer is a heterogeneous disease. To guide proper treatment decisions for each patient, robust prognostic biomarkers, which allow reliable prognosis prediction, are necessary. Gene feature selection based on microarray data is an approach to discover potential biomarkers systematically. However, standard pure-statistical feature selection approaches often fail to incorporate prior biological knowledge and select genes that lack biological insights. Besides, due to the high dimensionality and low sample size properties of microarray data, selecting robust gene features is an intrinsically challenging problem. We hence combined systems biology feature selection with ensemble learning in this study, aiming to select genes with biological insights and robust prognostic predictive power. Moreover, to capture breast cancer's complex molecular processes, we adopted a multi-gene approach to predict the prognosis status using deep learning classifiers. We found that all ensemble approaches could improve feature selection robustness, wherein the hybrid ensemble approach led to the most robust result. Among all prognosis prediction models, the bimodal deep neural network (DNN) achieved the highest test performance, further verified by survival analysis. In summary, this study demonstrated the potential of combining ensemble learning and bimodal DNN in guiding precision medicine.
乳腺癌是一种异质性疾病。为了指导每个患者的适当治疗决策,需要有可靠的预后生物标志物,以便能够进行可靠的预后预测。基于微阵列数据的基因特征选择是一种系统地发现潜在生物标志物的方法。然而,标准的纯统计特征选择方法往往无法结合先验的生物学知识,选择缺乏生物学见解的基因。此外,由于微阵列数据的高维性和低样本量特性,选择稳健的基因特征是一个内在具有挑战性的问题。因此,我们在这项研究中结合了系统生物学特征选择和集成学习,旨在选择具有生物学见解和稳健预后预测能力的基因。此外,为了捕捉乳腺癌的复杂分子过程,我们采用了多基因方法,使用深度学习分类器来预测预后状态。我们发现所有的集成方法都可以提高特征选择的稳健性,其中混合集成方法的效果最好。在所有的预后预测模型中,双模态深度神经网络(DNN)在测试性能上达到了最高,这一点通过生存分析得到了进一步验证。总之,这项研究证明了在指导精准医学方面结合集成学习和双模态 DNN 的潜力。