Software Department, University of Babylon, Babylon, Hillah, Iraq.
Computer Science Department, University of Kerbala, Babylon, Hillah, Iraq.
J Bioinform Comput Biol. 2021 Jun;19(3):2150009. doi: 10.1142/S0219720021500098. Epub 2021 Apr 28.
Defining protein complexes in the cell is important for learning about cellular processes mechanisms as they perform many of the molecular functions in these processes. Most of the proposed algorithms predict a complex as a dense area in a Protein-Protein Interaction (PPI) network. Others, on the other hand, weight the network using gene expression or geneontology (GO). These approaches, however, eliminate the proteins and their edges that offer no gene expression data. This can lead to the loss of important topological relations. Therefore, in this study, a method based on the Gene Expression and Core-Attachment (GECA) approach was proposed for addressing these limitations. GECA is a new technique to identify core proteins using common neighbor techniques and biological information. Moreover, GECA improves the attachment technique by adding the proteins that have low closeness but high similarity to the gene expression of the core proteins. GECA has been compared with several existing methods and proved in most datasets to be able to achieve the highest F-measure. The evaluation of complexes predicted by GECA shows high biological significance.
确定细胞中的蛋白质复合物对于了解细胞过程机制非常重要,因为它们在这些过程中执行许多分子功能。大多数提出的算法将复合物预测为蛋白质-蛋白质相互作用 (PPI) 网络中的密集区域。另一方面,其他算法则使用基因表达或基因本体论 (GO) 对网络进行加权。然而,这些方法会消除没有基因表达数据的蛋白质及其边缘。这可能导致重要拓扑关系的丢失。因此,在这项研究中,提出了一种基于基因表达和核心附着 (GECA) 方法的方法来解决这些限制。GECA 是一种使用常见邻居技术和生物信息识别核心蛋白质的新技术。此外,GECA 通过添加与核心蛋白质的基因表达具有低接近度但高度相似的蛋白质来改进附着技术。GECA 已与几种现有方法进行了比较,并在大多数数据集上证明能够实现最高的 F 度量。对 GECA 预测的复合物的评估显示出很高的生物学意义。