Fernandez-Lozano Carlos, Gestal Marcos, González-Díaz Humberto, Dorado Julián, Pazos Alejandro, Munteanu Cristian R
Information and Communication Technologies Department, Faculty of Computer Science, University of A Coruña, 15071A Coruña, Spain.
Information and Communication Technologies Department, Faculty of Computer Science, University of A Coruña, 15071A Coruña, Spain.
J Theor Biol. 2014 May 21;349:12-21. doi: 10.1016/j.jtbi.2014.01.033. Epub 2014 Jan 31.
The cell death (CD) is a dynamic biological function involved in physiological and pathological processes. Due to the complexity of CD, there is a demand for fast theoretical methods that can help to find new CD molecular targets. The current work presents the first classification model to predict CD-related proteins based on Markov Mean Properties. These protein descriptors have been calculated with the MInD-Prot tool using the topological information of the amino acid contact networks of the 2423 protein chains, five atom physicochemical properties and the protein 3D regions. The Machine Learning algorithms from Weka were used to find the best classification model for CD-related protein chains using all 20 attributes. The most accurate algorithm to solve this problem was K*. After several feature subset methods, the best model found is based on only 11 variables and is characterized by the Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.992 and the true positive rate (TP Rate) of 88.2% (validation set). 7409 protein chains labeled with "unknown function" in the PDB Databank were analyzed with the best model in order to predict the CD-related biological activity. Thus, several proteins have been predicted to have CD-related function in Homo sapiens: 3DRX-involved in virus-host interaction biological process, protein homooligomerization; 4DWF-involved in cell differentiation, chromatin modification, DNA damage response, protein stabilization; 1IUR-involved in ATP binding, chaperone binding; 1J7D-involved in DNA double-strand break processing, histone ubiquitination, nucleotide-binding oligomerization; 1UTU-linked with DNA repair, regulation of transcription; 3EEC-participating to the cellular membrane organization, egress of virus within host cell, class mediator resulting in cell cycle arrest, negative regulation of ubiquitin-protein ligase activity involved in mitotic cell cycle and apoptotic process. Other proteins from bacteria predicted as CD-related are 2G3V - a CAG pathogenicity island protein 13 from Helicobacter pylori, 4G5A - a hypothetical protein in Bacteroides thetaiotaomicron, 1YLK-involved in the nitrogen metabolism of Mycobacterium tuberculosis, and 1XSV - with possible DNA/RNA binding domains. The results demonstrated the possibility to predict CD-related proteins using molecular information encoded into the protein 3D structure. Thus, the current work demonstrated the possibility to predict new molecular targets involved in cell-death processes.
细胞死亡(CD)是一种涉及生理和病理过程的动态生物学功能。由于细胞死亡的复杂性,需要快速的理论方法来帮助寻找新的细胞死亡分子靶点。当前的工作提出了第一个基于马尔可夫均值特性预测细胞死亡相关蛋白质的分类模型。这些蛋白质描述符是使用MInD-Prot工具根据2423条蛋白质链的氨基酸接触网络的拓扑信息、五种原子物理化学性质和蛋白质三维区域计算得出的。使用来自Weka的机器学习算法,利用所有20个属性为细胞死亡相关蛋白质链找到最佳分类模型。解决此问题最准确的算法是K*。经过几种特征子集方法后,找到的最佳模型仅基于11个变量,其特征是受试者操作特征曲线下面积(AUROC)为0.992,真阳性率(TP率)为88.2%(验证集)。使用最佳模型对蛋白质数据银行(PDB)中标记为“功能未知”的7409条蛋白质链进行了分析,以预测细胞死亡相关的生物学活性。因此,已预测几种人类蛋白质具有细胞死亡相关功能:3DRX参与病毒-宿主相互作用生物学过程、蛋白质同寡聚化;4DWF参与细胞分化、染色质修饰、DNA损伤反应、蛋白质稳定化;1IUR参与ATP结合、伴侣蛋白结合;1J7D参与DNA双链断裂处理、组蛋白泛素化、核苷酸结合寡聚化;1UTU与DNA修复、转录调控有关;3EEC参与细胞膜组织、病毒在宿主细胞内的释放、导致细胞周期停滞的类介质、参与有丝分裂细胞周期和凋亡过程的泛素-蛋白连接酶活性的负调控。预测为与细胞死亡相关的其他细菌蛋白质有:2G3V——幽门螺杆菌的一种CAG致病岛蛋白13;4G5A——嗜热栖热放线菌中的一种假设蛋白;1YLK参与结核分枝杆菌的氮代谢;1XSV——可能具有DNA/RNA结合结构域。结果表明,利用编码在蛋白质三维结构中的分子信息预测细胞死亡相关蛋白质是有可能的。因此,当前的工作证明了预测参与细胞死亡过程的新分子靶点的可能性。