Zhu Xianyou, He Xin, Kuang Linai, Chen Zhiping, Lancine Camara
College of Computer Science and Technology, Hengyang Normal University, Hengyang, China.
Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang, China.
Front Genet. 2021 Oct 21;12:763153. doi: 10.3389/fgene.2021.763153. eCollection 2021.
Considering that traditional biological experiments are expensive and time consuming, it is important to develop effective computational models to infer potential essential proteins. In this manuscript, a novel collaborative filtering model-based method called CFMM was proposed, in which, an updated protein-domain interaction (PDI) network was constructed first by applying collaborative filtering algorithm on the original PDI network, and then, through integrating topological features of PDI networks with biological features of proteins, a calculative method was designed to infer potential essential proteins based on an improved PageRank algorithm. The novelties of CFMM lie in construction of an updated PDI network, application of the commodity-customer-based collaborative filtering algorithm, and introduction of the calculation method based on an improved PageRank algorithm, which ensured that CFMM can be applied to predict essential proteins without relying entirely on known protein-domain associations. Simulation results showed that CFMM can achieve reliable prediction accuracies of 92.16, 83.14, 71.37, 63.87, 55.84, and 52.43% in the top 1, 5, 10, 15, 20, and 25% predicted candidate key proteins based on the DIP database, which are remarkably higher than 14 competitive state-of-the-art predictive models as a whole, and in addition, CFMM can achieve satisfactory predictive performances based on different databases with various evaluation measurements, which further indicated that CFMM may be a useful tool for the identification of essential proteins in the future.
考虑到传统生物学实验成本高昂且耗时,开发有效的计算模型以推断潜在的必需蛋白质非常重要。在本论文中,提出了一种基于协同过滤模型的新方法CFMM,其中,首先通过在原始蛋白质-结构域相互作用(PDI)网络上应用协同过滤算法构建更新的PDI网络,然后,通过将PDI网络的拓扑特征与蛋白质的生物学特征相结合,设计了一种基于改进的PageRank算法来推断潜在必需蛋白质的计算方法。CFMM的新颖之处在于构建更新的PDI网络、应用基于商品-客户的协同过滤算法以及引入基于改进的PageRank算法的计算方法,这确保了CFMM可以在不完全依赖已知蛋白质-结构域关联的情况下用于预测必需蛋白质。模拟结果表明,基于DIP数据库,CFMM在预测的前1%、5%、10%、15%、20%和25%的候选关键蛋白质中分别能达到92.16%、83.14%、71.37%、63.87%、55.84%和52.43%的可靠预测准确率,总体上显著高于14种具有竞争力的先进预测模型,此外,CFMM基于不同数据库并采用各种评估指标时都能取得令人满意的预测性能,这进一步表明CFMM未来可能是识别必需蛋白质的有用工具。