基于 SVM 模型预测病毒和人类之间的蛋白质-蛋白质相互作用。

Prediction of protein-protein interactions between viruses and human by an SVM model.

机构信息

School of Computer Science and Engineering, Inha University, Incheon, South Korea.

出版信息

BMC Bioinformatics. 2012 May 8;13 Suppl 7(Suppl 7):S5. doi: 10.1186/1471-2105-13-S7-S5.

DOI:10.1186/1471-2105-13-S7-S5

PMID:22595002

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3348049/

Abstract

BACKGROUND

Several computational methods have been developed to predict protein-protein interactions from amino acid sequences, but most of those methods are intended for the interactions within a species rather than for interactions across different species. Methods for predicting interactions between homogeneous proteins are not appropriate for finding those between heterogeneous proteins since they do not distinguish the interactions between proteins of the same species from those of different species.

RESULTS

We developed a new method for representing a protein sequence of variable length in a frequency vector of fixed length, which encodes the relative frequency of three consecutive amino acids of a sequence. We built a support vector machine (SVM) model to predict human proteins that interact with virus proteins. In two types of viruses, human papillomaviruses (HPV) and hepatitis C virus (HCV), our SVM model achieved an average accuracy above 80%, which is higher than that of another SVM model with a different representation scheme. Using the SVM model and Gene Ontology (GO) annotations of proteins, we predicted new interactions between virus proteins and human proteins.

CONCLUSIONS

Encoding the relative frequency of amino acid triplets of a protein sequence is a simple yet powerful representation method for predicting protein-protein interactions across different species. The representation method has several advantages: (1) it enables a prediction model to achieve a better performance than other representations, (2) it generates feature vectors of fixed length regardless of the sequence length, and (3) the same representation is applicable to different types of proteins.

摘要

背景

已经开发出几种从氨基酸序列预测蛋白质-蛋白质相互作用的计算方法，但这些方法大多数旨在预测同一物种内的相互作用，而不是预测不同物种之间的相互作用。用于预测同源蛋白质相互作用的方法不适合寻找异源蛋白质之间的相互作用，因为它们无法区分同一物种蛋白质之间的相互作用与不同物种蛋白质之间的相互作用。

结果

我们开发了一种新方法，用于将可变长度的蛋白质序列表示为固定长度的频率向量，该向量编码序列中三个连续氨基酸的相对频率。我们构建了一个支持向量机 (SVM) 模型来预测与人相互作用的病毒蛋白质。在两种类型的病毒，人乳头瘤病毒 (HPV) 和丙型肝炎病毒 (HCV) 中，我们的 SVM 模型的平均准确率超过 80%，高于具有不同表示方案的另一个 SVM 模型的准确率。使用 SVM 模型和蛋白质的基因本体论 (GO) 注释，我们预测了病毒蛋白和人类蛋白之间的新相互作用。

结论

对蛋白质序列的氨基酸三联体的相对频率进行编码是一种简单而强大的预测不同物种之间蛋白质-蛋白质相互作用的表示方法。该表示方法具有几个优点：(1) 它使预测模型能够实现优于其他表示方法的性能，(2) 它生成固定长度的特征向量，而与序列长度无关，以及 (3) 相同的表示适用于不同类型的蛋白质。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于 SVM 模型预测病毒和人类之间的蛋白质-蛋白质相互作用。

Prediction of protein-protein interactions between viruses and human by an SVM model.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

基于 SVM 模型预测病毒和人类之间的蛋白质-蛋白质相互作用。

Prediction of protein-protein interactions between viruses and human by an SVM model.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献