IEEE/ACM Trans Comput Biol Bioinform. 2020 Sep-Oct;17(5):1605-1612. doi: 10.1109/TCBB.2019.2909905. Epub 2019 Apr 9.
Breast cancer is one of the most common cancers all over the world, which bring about more than 450,000 deaths each year. Although this malignancy has been extensively studied by a large number of researchers, its prognosis is still poor. Since therapeutic advance can be obtained based on gene signatures, there is an urgent need to discover genes related to breast cancer that may help uncover the mechanisms in cancer progression. We propose a deep learning method for the discovery of breast cancer-related genes by using Capsule Network based Modeling of Multi-omics Data (CapsNetMMD). In CapsNetMMD, we make use of known breast cancer-related genes to transform the issue of gene identification into the issue of supervised classification. The features of genes are generated through comprehensive integration of multi-omics data, e.g., mRNA expression, z scores for mRNA expression, DNA methylation, and two forms of DNA copy-number alterations (CNAs). By modeling features based on the capsule network, we identify breast cancer-related genes with a significantly better performance than other existing machine learning methods. The predicted genes with prognostic values play potential important roles in breast cancer and may serve as candidates for biologists and medical scientists in the future studies of biomarkers.
乳腺癌是全世界最常见的癌症之一,每年导致超过 45 万人死亡。尽管这种恶性肿瘤已经被大量研究人员广泛研究,但它的预后仍然很差。由于可以基于基因特征获得治疗进展,因此迫切需要发现与乳腺癌相关的基因,这可能有助于揭示癌症进展中的机制。我们提出了一种基于胶囊网络的多组学数据建模(CapsNetMMD)的乳腺癌相关基因发现的深度学习方法。在 CapsNetMMD 中,我们利用已知的乳腺癌相关基因将基因识别问题转化为监督分类问题。通过综合整合多组学数据(例如 mRNA 表达、mRNA 表达的 z 分数、DNA 甲基化和两种形式的 DNA 拷贝数改变(CNA))来生成基因特征。通过基于胶囊网络对特征进行建模,我们确定了与乳腺癌相关的基因,其性能明显优于其他现有机器学习方法。具有预后价值的预测基因在乳腺癌中可能发挥重要作用,并可能成为未来生物标志物研究中生物学家和医学科学家的候选基因。