College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022, China.
Department of Ultrasound, Hunan Provincial Maternal and Child Health Care Hospital, Changsha, Hunan 410008, China.
Math Biosci Eng. 2022 Apr 20;19(6):6331-6343. doi: 10.3934/mbe.2022296.
High throughput biological experiments are expensive and time consuming. For the past few years, many computational methods based on biological information have been proposed and widely used to understand the biological background. However, the processing of biological information data inevitably produces false positive and false negative data, such as the noise in the Protein-Protein Interaction (PPI) networks and the noise generated by the integration of a variety of biological information. How to solve these noise problems is the key role in essential protein predictions. An Identifying Essential Proteins model based on non-negative Matrix Symmetric tri-Factorization and multiple biological information (IEPMSF) is proposed in this paper, which utilizes only the PPI network proteins common neighbor characters to develop a weighted network, and uses the non-negative matrix symmetric tri-factorization method to find more potential interactions between proteins in the network so as to optimize the weighted network. Then, using the subcellular location and lineal homology information, the starting score of proteins is determined, and the random walk algorithm with restart mode is applied to the optimized network to mark and rank each protein. We tested the suggested forecasting model against current representative approaches using a public database. Experiment shows high efficiency of new method in essential proteins identification. The effectiveness of this method shows that it can dramatically solve the noise problems that existing in the multi-source biological information itself and cased by integrating them.
高通量生物实验既昂贵又耗时。在过去的几年中,已经提出并广泛使用了许多基于生物信息的计算方法来了解生物背景。然而,生物信息数据的处理不可避免地会产生假阳性和假阴性数据,例如蛋白质-蛋白质相互作用(PPI)网络中的噪声和各种生物信息集成产生的噪声。如何解决这些噪声问题是关键,本文提出了一种基于非负矩阵对称三因子分解和多种生物信息(IEPMSF)的识别必需蛋白模型,该模型仅利用 PPI 网络中蛋白质共同邻居的特征来开发加权网络,并使用非负矩阵对称三因子分解方法在网络中找到更多蛋白质之间的潜在相互作用,从而优化加权网络。然后,利用亚细胞定位和线性同源信息确定蛋白质的起始分数,并将带重启模式的随机游走算法应用于优化网络,以标记和对每个蛋白质进行排序。我们使用公共数据库对建议的预测模型进行了测试,与当前有代表性的方法进行了比较。实验表明,该方法在必需蛋白识别方面具有很高的效率。该方法的有效性表明,它可以显著解决多源生物信息本身集成时产生的噪声问题。