Jia Jianhua, Liu Zi, Xiao Xuan, Liu Bingxiang, Chou Kuo-Chen
Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, China; School of Computer Science, University of Birmingham, Edgbaston Birmingham B15 2TT, UK.
Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, China.
J Theor Biol. 2015 Jul 21;377:47-56. doi: 10.1016/j.jtbi.2015.04.011. Epub 2015 Apr 20.
A cell contains thousands of proteins. Many important functions of cell are carried out through the proteins therein. Proteins rarely function alone. Most of their functions essential to life are associated with various types of protein-protein interactions (PPIs). Therefore, knowledge of PPIs is fundamental for both basic research and drug development. With the avalanche of proteins sequences generated in the postgenomic age, it is highly desired to develop computational methods for timely acquiring this kind of knowledge. Here, a new predictor, called "iPPI-Emsl", is developed. In the predictor, a protein sample is formulated by incorporating the following two types of information into the general form of PseAAC (pseudo amino acid composition): (1) the physicochemical properties derived from the constituent amino acids of a protein; and (2) the wavelet transforms derived from the numerical series along a protein chain. The operation engine to run the predictor is an ensemble classifier formed by fusing seven individual random forest engines via a voting system. It is demonstrated with the benchmark dataset from Saccharomyces cerevisiae as well as the dataset from Helicobacter pylori that the new predictor achieves remarkably higher success rates than any of the existing predictors in this area. The new predictor׳ web-server has been established at http://www.jci-bioinfo.cn/iPPI-Esml. For the convenience of most experimental scientists, we have further provided a step-by-step guide, by which users can easily get their desired results without the need to follow the complicated mathematics involved during its development.
一个细胞包含数千种蛋白质。细胞的许多重要功能都是通过其中的蛋白质来实现的。蛋白质很少单独发挥作用。它们对生命至关重要的大多数功能都与各种类型的蛋白质 - 蛋白质相互作用(PPI)相关。因此,PPI的知识对于基础研究和药物开发都至关重要。随着后基因组时代产生的蛋白质序列大量涌现,迫切需要开发计算方法来及时获取这类知识。在此,开发了一种名为“iPPI - Emsl”的新预测器。在该预测器中,通过将以下两种类型的信息纳入伪氨基酸组成(PseAAC)的一般形式来构建蛋白质样本:(1)源自蛋白质组成氨基酸的物理化学性质;(2)源自沿蛋白质链的数字序列的小波变换。运行该预测器的操作引擎是一个通过投票系统融合七个个体随机森林引擎形成的集成分类器。使用来自酿酒酵母的基准数据集以及来自幽门螺杆菌的数据集证明,新预测器在此领域取得了比任何现有预测器都显著更高的成功率。新预测器的网络服务器已在http://www.jci - bioinfo.cn/iPPI - Esml建立。为了方便大多数实验科学家,我们还提供了一份详细的指南,通过该指南用户可以轻松获得他们想要的结果,而无需了解其开发过程中涉及的复杂数学。