使用统计聚类和神经网络对蛋白质结构类别预测进行交叉验证。

Cross-validation of protein structural class prediction using statistical clustering and neural networks.

作者信息

Metfessel B A, Saurugger P N, Connelly D P, Rich S S

机构信息

Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis 55455.

出版信息

Protein Sci. 1993 Jul;2(7):1171-82. doi: 10.1002/pro.5560020712.

DOI:10.1002/pro.5560020712

PMID:8358300

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2142422/

Abstract

We present an approach to predicting protein structural class that uses amino acid composition and hydrophobic pattern frequency information as input to two types of neural networks: (1) a three-layer back-propagation network and (2) a learning vector quantization network. The results of these methods are compared to those obtained from a modified Euclidean statistical clustering algorithm. The protein sequence data used to drive these algorithms consist of the normalized frequency of up to 20 amino acid types and six hydrophobic amino acid patterns. From these frequency values the structural class predictions for each protein (all-alpha, all-beta, or alpha-beta classes) are derived. Examples consisting of 64 previously classified proteins were randomly divided into multiple training (56 proteins) and test (8 proteins) sets. The best performing algorithm on the test sets was the learning vector quantization network using 17 inputs, obtaining a prediction accuracy of 80.2%. The Matthews correlation coefficients are statistically significant for all algorithms and all structural classes. The differences between algorithms are in general not statistically significant. These results show that information exists in protein primary sequences that is easily obtainable and useful for the prediction of protein structural class by neural networks as well as by standard statistical clustering algorithms.

摘要

我们提出了一种预测蛋白质结构类别的方法，该方法将氨基酸组成和疏水模式频率信息作为输入，应用于两种类型的神经网络：（1）一个三层反向传播网络和（2）一个学习矢量量化网络。将这些方法的结果与通过改进的欧几里得统计聚类算法获得的结果进行比较。用于驱动这些算法的蛋白质序列数据由多达20种氨基酸类型的归一化频率和六种疏水氨基酸模式组成。根据这些频率值得出每种蛋白质（全α、全β或α-β类）的结构类预测。由64个先前分类的蛋白质组成的示例被随机分为多个训练集（56个蛋白质）和测试集（8个蛋白质）。在测试集上表现最佳的算法是使用17个输入的学习矢量量化网络，预测准确率为80.2%。所有算法和所有结构类别的马修斯相关系数均具有统计学意义。算法之间的差异一般无统计学意义。这些结果表明，蛋白质一级序列中存在易于获取的信息，这些信息对于通过神经网络以及标准统计聚类算法预测蛋白质结构类别很有用。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用统计聚类和神经网络对蛋白质结构类别预测进行交叉验证。

Cross-validation of protein structural class prediction using statistical clustering and neural networks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

使用统计聚类和神经网络对蛋白质结构类别预测进行交叉验证。

Cross-validation of protein structural class prediction using statistical clustering and neural networks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献