Jain Prerna, Thukral Nitin, Gahlot Lokesh Kumar, Hasija Yasha
Department of Biotechnology, Delhi Technological University, Shahbad Daulatpur, Main Bawana Road, Delhi, 110042 India.
Syst Synth Biol. 2015 Jun;9(1-2):55-66. doi: 10.1007/s11693-015-9164-z. Epub 2015 Mar 14.
Interactions between proteins largely govern cellular processes and this has led to numerous efforts culminating in enormous information related to the proteins, their interactions and the function which is determined by their interactions. The main concern of the present study is to present interface analysis of cardiovascular-disorder (CVD) related proteins to shed lights on details of interactions and to emphasize the importance of using structures in network studies. This study combines the network-centred approach with three dimensional studies to comprehend the fundamentals of biology. Interface properties were used as descriptors to classify the CVD associated proteins and non-CVD associated proteins. Machine learning algorithm was used to generate a classifier based on the training set which was then used to predict potential CVD related proteins from a set of polymorphic proteins which are not known to be involved in any disease. Among several classifying algorithms applied to generate models, best performance was achieved using Random Forest with an accuracy of 69.5 %. The tool named CARDIO-PRED, based on the prediction model is present at http://www.genomeinformatics.dce.edu/CARDIO-PRED/. The predicted CVD related proteins may not be the causing factor of particular disease but can be involved in pathways and reactions yet unknown to us thus permitting a more rational analysis of disease mechanism. Study of their interactions with other proteins can significantly improve our understanding of the molecular mechanism of diseases.
蛋白质之间的相互作用在很大程度上支配着细胞过程,这促使人们付出了诸多努力,最终产生了大量与蛋白质、它们的相互作用以及由这些相互作用所决定的功能相关的信息。本研究的主要关注点是对心血管疾病(CVD)相关蛋白质进行界面分析,以揭示相互作用的细节,并强调在网络研究中使用结构的重要性。本研究将以网络为中心的方法与三维研究相结合,以理解生物学的基本原理。界面特性被用作描述符,对与CVD相关的蛋白质和与非CVD相关的蛋白质进行分类。使用机器学习算法基于训练集生成一个分类器,然后用该分类器从一组未知参与任何疾病的多态性蛋白质中预测潜在的与CVD相关的蛋白质。在应用于生成模型的几种分类算法中,使用随机森林算法取得了最佳性能,准确率为69.5%。基于该预测模型的工具CARDIO-PRED可在http://www.genomeinformatics.dce.edu/CARDIO-PRED/获取。预测出的与CVD相关的蛋白质可能不是特定疾病的致病因素,但可能参与我们尚不知道的途径和反应,从而使对疾病机制的分析更加合理。研究它们与其他蛋白质的相互作用能够显著提高我们对疾病分子机制的理解。