School of Information Management, Shanghai Lixin University of Accounting and Finance, Shanghai, China.
Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai, China.
BMC Bioinformatics. 2022 Aug 16;23(Suppl 8):339. doi: 10.1186/s12859-022-04874-w.
Essential proteins are indispensable to the development and survival of cells. The identification of essential proteins not only is helpful for the understanding of the minimal requirements for cell survival, but also has practical significance in disease diagnosis, drug design and medical treatment. With the rapidly amassing of protein-protein interaction (PPI) data, computationally identifying essential proteins from protein-protein interaction networks (PINs) becomes more and more popular. Up to now, a number of various approaches for essential protein identification based on PINs have been developed.
In this paper, we propose a new and effective approach called iMEPP to identify essential proteins from PINs by fusing multiple types of biological data and applying the influence maximization mechanism to the PINs. Concretely, we first integrate PPI data, gene expression data and Gene Ontology to construct weighted PINs, to alleviate the impact of high false-positives in the raw PPI data. Then, we define the influence scores of nodes in PINs with both orthological data and PIN topological information. Finally, we develop an influence discount algorithm to identify essential proteins based on the influence maximization mechanism.
We applied our method to identifying essential proteins from saccharomyces cerevisiae PIN. Experiments show that our iMEPP method outperforms the existing methods, which validates its effectiveness and advantage.
必需蛋白对于细胞的发育和存活是不可或缺的。鉴定必需蛋白不仅有助于理解细胞存活的最低要求,而且在疾病诊断、药物设计和医疗方面具有实际意义。随着蛋白质-蛋白质相互作用(PPI)数据的迅速积累,从蛋白质-蛋白质相互作用网络(PINs)中计算识别必需蛋白变得越来越流行。到目前为止,已经开发了许多基于 PINs 的用于识别必需蛋白的各种方法。
在本文中,我们提出了一种新的有效方法,称为 iMEPP,通过融合多种类型的生物数据并将影响最大化机制应用于 PINs 来从 PINs 中识别必需蛋白。具体来说,我们首先整合 PPI 数据、基因表达数据和基因本体论来构建加权 PINs,以减轻原始 PPI 数据中高假阳性的影响。然后,我们使用同源数据和 PIN 拓扑信息定义 PINs 中节点的影响分数。最后,我们开发了一种影响折扣算法,基于影响最大化机制来识别必需蛋白。
我们将我们的方法应用于从酿酒酵母 PIN 中识别必需蛋白。实验表明,我们的 iMEPP 方法优于现有的方法,验证了其有效性和优势。