Suppr超能文献

基于进化矩阵表示的随机蕨类预测蛋白质-蛋白质相互作用。

Predicting Protein-Protein Interactions via Random Ferns with Evolutionary Matrix Representation.

机构信息

School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China.

School of Information Engineering, Xijing University, Xi'an 710123, China.

出版信息

Comput Math Methods Med. 2022 Feb 22;2022:7191684. doi: 10.1155/2022/7191684. eCollection 2022.

Abstract

Protein-protein interactions (PPIs) play a crucial role in understanding disease pathogenesis, genetic mechanisms, guiding drug design, and other biochemical processes, thus, the identification of PPIs is of great importance. With the rapid development of high-throughput sequencing technology, a large amount of PPIs sequence data has been accumulated. Researchers have designed many experimental methods to detect PPIs by using these sequence data, hence, the prediction of PPIs has become a research hotspot in proteomics. However, since traditional experimental methods are both time-consuming and costly, it is difficult to analyze and predict the massive amount of PPI data quickly and accurately. To address these issues, many computational systems employing machine learning knowledge were widely applied to PPIs prediction, thereby improving the overall recognition rate. In this paper, a novel and efficient computational technology is presented to implement a protein interaction prediction system using only protein sequence information. First, the Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST) was employed to generate a position-specific scoring matrix (PSSM) containing protein evolutionary information from the initial protein sequence. Second, we used a novel data processing feature representation scheme, MatFLDA, to extract the essential information of PSSM for protein sequences and obtained five training and five testing datasets by adopting a five-fold cross-validation method. Finally, the random fern (RFs) classifier was employed to infer the interactions among proteins, and a model called MatFLDA_RFs was developed. The proposed MatFLDA_RFs model achieved good prediction performance with 95.03% average accuracy on dataset and 85.35% average accuracy on dataset, which effectively outperformed other existing computational methods. The experimental results indicate that the proposed method is capable of yielding better prediction results of PPIs, which provides an effective tool for the detection of new PPIs and the in-depth study of proteomics. Finally, we also developed a web server for the proposed model to predict protein-protein interactions, which is freely accessible online at http://120.77.11.78:5001/webserver/MatFLDA_RFs.

摘要

蛋白质-蛋白质相互作用 (PPIs) 在理解疾病发病机制、遗传机制、指导药物设计和其他生化过程中起着至关重要的作用,因此,鉴定 PPIs 非常重要。随着高通量测序技术的快速发展,积累了大量的 PPIs 序列数据。研究人员设计了许多实验方法,利用这些序列数据来检测 PPIs,因此,PPIs 的预测已成为蛋白质组学的研究热点。然而,由于传统的实验方法既耗时又昂贵,因此很难快速准确地分析和预测大量的 PPI 数据。为了解决这些问题,许多采用机器学习知识的计算系统被广泛应用于 PPIs 预测,从而提高整体识别率。在本文中,提出了一种新颖而有效的计算技术,仅使用蛋白质序列信息实现蛋白质相互作用预测系统。首先,使用位置特异性迭代基本局部比对搜索工具 (PSI-BLAST) 从初始蛋白质序列生成包含蛋白质进化信息的位置特异性评分矩阵 (PSSM)。其次,我们使用了一种新颖的数据处理特征表示方案 MatFLDA,从蛋白质序列中提取 PSSM 的重要信息,并采用五折交叉验证方法获得五个训练数据集和五个测试数据集。最后,使用随机蕨 (RFs) 分类器推断蛋白质之间的相互作用,并开发了一个名为 MatFLDA_RFs 的模型。该模型在数据集上的平均准确率为 95.03%,在数据集上的平均准确率为 85.35%,性能优于其他现有的计算方法。实验结果表明,该方法能够获得更好的 PPIs 预测结果,为新的 PPIs 检测和蛋白质组学的深入研究提供了有效的工具。最后,我们还为所提出的模型开发了一个 Web 服务器,用于预测蛋白质-蛋白质相互作用,可在 http://120.77.11.78:5001/webserver/MatFLDA_RFs 在线免费访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7923/8888042/be1f06e4970a/CMMM2022-7191684.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验