Saha Sovan, Chatterjee Piyali, Basu Subhadip, Nasipuri Mita, Plewczynski Dariusz
Department of Computer Science and Engineering, Dr. Sudhir Chandra Sur Degree Engineering College, Kolkata, West Bengal, India.
Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, India.
PeerJ. 2019 May 22;7:e6830. doi: 10.7717/peerj.6830. eCollection 2019.
Proteins are the most versatile macromolecules in living systems and perform crucial biological functions. In the advent of the post-genomic era, the next generation sequencing is done routinely at the population scale for a variety of species. The challenging problem is to massively determine the functions of proteins that are yet not characterized by detailed experimental studies. Identification of protein functions experimentally is a laborious and time-consuming task involving many resources. We therefore propose the automated protein function prediction methodology using in silico algorithms trained on carefully curated experimental datasets. We present the improved protein function prediction tool FunPred 3.0, an extended version of our previous methodology FunPred 2, which exploits neighborhood properties in protein-protein interaction network (PPIN) and physicochemical properties of amino acids. Our method is validated using the available functional annotations in the PPIN network of in the latest Munich information center for protein (MIPS) dataset. The PPIN data of in MIPS dataset includes 4,554 unique proteins in 13,528 protein-protein interactions after the elimination of the self-replicating and the self-interacting protein pairs. Using the developed FunPred 3.0 tool, we are able to achieve the mean precision, the recall and the -score values of 0.55, 0.82 and 0.66, respectively. FunPred 3.0 is then used to predict the functions of unpredicted protein pairs (incomplete and missing functional annotations) in MIPS dataset of . The method is also capable of predicting the subcellular localization of proteins along with its corresponding functions. The code and the complete prediction results are available freely at: https://github.com/SovanSaha/FunPred-3.0.git.
蛋白质是生命系统中最多能的大分子,执行着至关重要的生物学功能。在后基因组时代,下一代测序已在多种物种的群体规模上常规进行。具有挑战性的问题是大规模确定尚未通过详细实验研究表征的蛋白质功能。通过实验鉴定蛋白质功能是一项耗费人力且耗时的任务,需要许多资源。因此,我们提出了一种自动蛋白质功能预测方法,该方法使用在精心策划的实验数据集上训练的计算机算法。我们展示了改进后的蛋白质功能预测工具FunPred 3.0,它是我们之前方法FunPred 2的扩展版本,利用了蛋白质 - 蛋白质相互作用网络(PPIN)中的邻域属性和氨基酸的物理化学属性。我们的方法使用最新的慕尼黑蛋白质信息中心(MIPS)数据集中PPIN网络中的可用功能注释进行了验证。MIPS数据集中的PPIN数据在消除自我复制和自我相互作用的蛋白质对后,包括13,528个蛋白质 - 蛋白质相互作用中的4,554个独特蛋白质。使用开发的FunPred 3.0工具,我们分别能够实现平均精度、召回率和F1分数值为0.55、0.82和0.66。然后,FunPred 3.0用于预测MIPS数据集中未预测的蛋白质对(不完整和缺失功能注释)的功能。该方法还能够预测蛋白质的亚细胞定位及其相应功能。代码和完整的预测结果可在以下网址免费获取:https://github.com/SovanSaha/FunPred-3.0.git。