Bioinformatics Research Lab, The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA, 70503, USA.
Bioinformatics Research Lab, The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA, 70503, USA; Informatics Research Institute, University of Louisiana at Lafayette, Lafayette, LA, 70506, USA.
Comput Biol Med. 2021 Jun;133:104323. doi: 10.1016/j.compbiomed.2021.104323. Epub 2021 Apr 5.
Mutations in proto-oncogenes (ONGO) and the loss of regulatory function of tumor suppression genes (TSG) are the common underlying mechanism for uncontrolled tumor growth. While cancer is a heterogeneous complex of distinct diseases, finding the potentiality of the genes related functionality to ONGO or TSG through computational studies can help develop drugs that target the disease. This paper proposes a classification method that starts with a preprocessing stage to extract the feature map sets from the input 3D protein structural information. The next stage is a deep convolutional neural network stage (DCNN) that outputs the probability of functional classification of genes. We explored and tested two approaches: in Approach 1, all filtered and cleaned 3D-protein-structures (PDB) are pooled together, whereas in Approach 2, the primary structures and their corresponding PDBs are separated according to the genes' primary structural information. Following the DCNN stage, a dynamic programming-based method is used to determine the final prediction of the primary structures' functionality. We validated our proposed method using the COSMIC online database. For the ONGO vs TSG classification problem the AUROC of the DCNN stage for Approach 1 and Approach 2 DCNN are 0.978 and 0.765, respectively. The AUROCs of the final genes' primary structure functionality classification for Approach 1 and Approach 2 are 0.989, and 0.879, respectively. For comparison, the current state-of-the-art reported AUROC is 0.924. Our results warrant further study to apply the deep learning models to humans' (GRCh38) genes, for predicting their corresponding probabilities of functionality in the cancer drivers.
原癌基因(ONGO)的突变和肿瘤抑制基因(TSG)的调控功能丧失是肿瘤失控生长的常见机制。虽然癌症是一种异质性的复杂疾病,但通过计算研究发现与 ONGO 或 TSG 相关的基因的潜在功能,可以帮助开发针对该疾病的药物。本文提出了一种分类方法,该方法从预处理阶段开始,从输入的 3D 蛋白质结构信息中提取特征图集。下一阶段是一个深度卷积神经网络阶段(DCNN),它输出基因功能分类的概率。我们探索并测试了两种方法:在方法 1 中,所有过滤和清理的 3D-蛋白质结构(PDB)都汇集在一起,而在方法 2 中,根据基因的一级结构信息,将一级结构及其相应的 PDB 分开。在 DCNN 阶段之后,使用基于动态规划的方法来确定一级结构功能的最终预测。我们使用 COSMIC 在线数据库验证了我们提出的方法。对于 ONGO 与 TSG 的分类问题,方法 1 和方法 2 DCNN 的 DCNN 阶段的 AUROC 分别为 0.978 和 0.765。方法 1 和方法 2 的最终基因一级结构功能分类的 AUROCs 分别为 0.989 和 0.879。相比之下,当前最先进的报告的 AUROC 为 0.924。我们的结果证明了进一步研究的必要性,即将深度学习模型应用于人类(GRCh38)基因,以预测它们在癌症驱动因素中的相应功能概率。