School of Science, East China Jiaotong University, Nanchang, 330013, China.
School of Computer and Information Engineering, Nanning Normal University, Nanning, 530299, China.
Interdiscip Sci. 2021 Sep;13(3):349-361. doi: 10.1007/s12539-021-00426-7. Epub 2021 Mar 27.
Essential proteins are assumed to be an indispensable element in sustaining normal physiological function and crucial to drug design and disease diagnosis. The discovery of essential proteins is of great importance in revealing the molecular mechanisms and biological processes. Owing to the tedious biological experiment, many numerical methods have been developed to discover key proteins by mining the features of the high throughput data. Appropriate integration of differential biological information based on protein-protein interaction (PPI) network has been proven useful in predicting essential proteins. The main intention of this research is to provide a comprehensive study and a review on identifying essential proteins by integrating multi-source data and provide guidance for researchers. Detailed analysis and comparison of current essential protein prediction algorithms have been carried out and tested on benchmark PPI networks. In addition, based on the previous method TEGS (short for the network Topology, gene Expression, Gene ontology, and Subcellular localization), we improve the performance of predicting essential proteins by incorporating known protein complex information, the gene expression profile, Gene Ontology (GO) terms information, subcellular localization information, and protein's orthology data into the PPI network, named CEGSO. The simulation results show that CEGSO achieves more accurate and robust results than other compared methods under different test datasets with various evaluation measurements.
必需蛋白质被认为是维持正常生理功能的不可或缺的元素,对药物设计和疾病诊断至关重要。必需蛋白质的发现对于揭示分子机制和生物过程具有重要意义。由于生物实验繁琐,已经开发了许多数值方法通过挖掘高通量数据的特征来发现关键蛋白质。基于蛋白质-蛋白质相互作用(PPI)网络的适当整合差异生物信息已被证明有助于预测必需蛋白质。本研究的主要目的是通过整合多源数据,对识别必需蛋白质进行全面的研究和综述,为研究人员提供指导。对当前的必需蛋白质预测算法进行了详细的分析和比较,并在基准 PPI 网络上进行了测试。此外,基于之前的方法 TEGS(网络拓扑、基因表达、基因本体和亚细胞定位的缩写),我们通过将已知的蛋白质复合物信息、基因表达谱、基因本体论(GO)术语信息、亚细胞定位信息和蛋白质的同源数据纳入 PPI 网络,改进了预测必需蛋白质的性能,命名为 CEGSO。模拟结果表明,CEGSO 在不同的测试数据集和各种评估指标下,与其他比较方法相比,能够获得更准确和稳健的结果。