Basith Shaherin, Manavalan Balachandran, Shin Tae Hwan, Lee Gwang
Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea.
Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea.
Comput Struct Biotechnol J. 2018 Oct 24;16:412-420. doi: 10.1016/j.csbj.2018.10.007. eCollection 2018.
A soluble carrier growth hormone binding protein (GHBP) that can selectively and non-covalently interact with growth hormone, thereby acting as a modulator or inhibitor of growth hormone signalling. Accurate identification of the GHBP from a given protein sequence also provides important clues for understanding cell growth and cellular mechanisms. In the postgenomic era, there has been an abundance of protein sequence data garnered, hence it is crucial to develop an automated computational method which enables fast and accurate identification of putative GHBPs within a vast number of candidate proteins. In this study, we describe a novel machine-learning-based predictor called iGHBP for the identification of GHBP. In order to predict GHBP from a given protein sequence, we trained an extremely randomised tree with an optimal feature set that was obtained from a combination of dipeptide composition and amino acid index values by applying a two-step feature selection protocol. During cross-validation analysis, iGHBP achieved an accuracy of 84.9%, which was ~7% higher than the control extremely randomised tree predictor trained with all features, thus demonstrating the effectiveness of our feature selection protocol. Furthermore, when objectively evaluated on an independent data set, our proposed iGHBP method displayed superior performance compared to the existing method. Additionally, a user-friendly web server that implements the proposed iGHBP has been established and is available at http://thegleelab.org/iGHBP.
一种可溶性载体生长激素结合蛋白(GHBP),它可以与生长激素选择性地非共价相互作用,从而作为生长激素信号传导的调节剂或抑制剂。从给定的蛋白质序列中准确鉴定GHBP也为理解细胞生长和细胞机制提供了重要线索。在后基因组时代,已经积累了大量的蛋白质序列数据,因此开发一种自动化的计算方法至关重要,该方法能够在大量候选蛋白质中快速准确地鉴定出假定的GHBP。在本研究中,我们描述了一种名为iGHBP的基于机器学习的新型预测器,用于鉴定GHBP。为了从给定的蛋白质序列中预测GHBP,我们使用通过两步特征选择协议从二肽组成和氨基酸指数值的组合中获得的最佳特征集训练了一个极端随机树。在交叉验证分析中,iGHBP的准确率达到了84.9%,比使用所有特征训练的对照极端随机树预测器高出约7%,从而证明了我们特征选择协议的有效性。此外,在独立数据集上进行客观评估时,我们提出的iGHBP方法与现有方法相比表现出优越的性能。此外,已经建立了一个实现所提出的iGHBP的用户友好型网络服务器,可在http://thegleelab.org/iGHBP上获得。