Li Min, Zheng Ruiqing, Zhang Hanhui, Wang Jianxin, Pan Yi
School of Information Science and Engineering, Central South University, Changsha 410083, China; State Key Laboratory of Medical Genetics, Central South University, Changsha 410078, China.
School of Information Science and Engineering, Central South University, Changsha 410083, China.
Methods. 2014 Jun 1;67(3):325-33. doi: 10.1016/j.ymeth.2014.02.016. Epub 2014 Feb 21.
Identification of essential proteins is very important for understanding the minimal requirements for cellular life and also necessary for a series of practical applications, such as drug design. With the advances in high throughput technologies, a large number of protein-protein interactions are available, which makes it possible to detect proteins' essentialities from the network level. Considering that most species already have a number of known essential proteins, we proposed a new priori knowledge-based scheme to discover new essential proteins from protein interaction networks. Based on the new scheme, two essential protein discovery algorithms, CPPK and CEPPK, were developed. CPPK predicts new essential proteins based on network topology and CEPPK detects new essential proteins by integrating network topology and gene expressions. The performances of CPPK and CEPPK were validated based on the protein interaction network of Saccharomyces cerevisiae. The experimental results showed that the priori knowledge of known essential proteins was effective for improving the predicted precision. The predicted precisions of CPPK and CEPPK clearly exceeded that of the other 10 previously proposed essential protein discovery methods: Degree Centrality (DC), Betweenness Centrality (BC), Closeness Centrality (CC), Subgraph Centrality (SC), Eigenvector Centrality (EC), Information Centrality (IC), Bottle Neck (BN), Density of Maximum Neighborhood Component (DMNC), Local Average Connectivity-based method (LAC), and Network Centrality (NC). Especially, CPPK achieved 40% improvement in precision over BC, CC, SC, EC, and BN, and CEPPK performed even better. CEPPK was also compared to four other methods (EPC, ORFL, PeC, and CoEWC) which were not node centralities and CEPPK was showed to achieve the best results.
识别必需蛋白质对于理解细胞生命的最小需求非常重要,并且对于一系列实际应用(如药物设计)也是必要的。随着高通量技术的进步,大量的蛋白质 - 蛋白质相互作用数据可用,这使得从网络层面检测蛋白质的必需性成为可能。考虑到大多数物种已经有许多已知的必需蛋白质,我们提出了一种基于先验知识的新方案,用于从蛋白质相互作用网络中发现新的必需蛋白质。基于该新方案,开发了两种必需蛋白质发现算法,即CPPK和CEPPK。CPPK基于网络拓扑预测新的必需蛋白质,而CEPPK通过整合网络拓扑和基因表达来检测新的必需蛋白质。基于酿酒酵母的蛋白质相互作用网络对CPPK和CEPPK的性能进行了验证。实验结果表明,已知必需蛋白质的先验知识对于提高预测精度是有效的。CPPK和CEPPK的预测精度明显超过了之前提出的其他10种必需蛋白质发现方法:度中心性(DC)、介数中心性(BC)、紧密中心性(CC)、子图中心性(SC)、特征向量中心性(EC)、信息中心性(IC)、瓶颈(BN)、最大邻域组分密度(DMNC)、基于局部平均连通性的方法(LAC)和网络中心性(NC)。特别是,CPPK在精度上比BC、CC、SC、EC和BN提高了40%,而CEPPK表现得更好。CEPPK还与其他四种不是节点中心性的方法(EPC、ORFL、PeC和CoEWC)进行了比较,结果表明CEPPK取得了最佳结果。