Suppr超能文献

使用统计聚类和神经网络对蛋白质结构类别预测进行交叉验证。

Cross-validation of protein structural class prediction using statistical clustering and neural networks.

作者信息

Metfessel B A, Saurugger P N, Connelly D P, Rich S S

机构信息

Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis 55455.

出版信息

Protein Sci. 1993 Jul;2(7):1171-82. doi: 10.1002/pro.5560020712.

Abstract

We present an approach to predicting protein structural class that uses amino acid composition and hydrophobic pattern frequency information as input to two types of neural networks: (1) a three-layer back-propagation network and (2) a learning vector quantization network. The results of these methods are compared to those obtained from a modified Euclidean statistical clustering algorithm. The protein sequence data used to drive these algorithms consist of the normalized frequency of up to 20 amino acid types and six hydrophobic amino acid patterns. From these frequency values the structural class predictions for each protein (all-alpha, all-beta, or alpha-beta classes) are derived. Examples consisting of 64 previously classified proteins were randomly divided into multiple training (56 proteins) and test (8 proteins) sets. The best performing algorithm on the test sets was the learning vector quantization network using 17 inputs, obtaining a prediction accuracy of 80.2%. The Matthews correlation coefficients are statistically significant for all algorithms and all structural classes. The differences between algorithms are in general not statistically significant. These results show that information exists in protein primary sequences that is easily obtainable and useful for the prediction of protein structural class by neural networks as well as by standard statistical clustering algorithms.

摘要

我们提出了一种预测蛋白质结构类别的方法,该方法将氨基酸组成和疏水模式频率信息作为输入,应用于两种类型的神经网络:(1)一个三层反向传播网络和(2)一个学习矢量量化网络。将这些方法的结果与通过改进的欧几里得统计聚类算法获得的结果进行比较。用于驱动这些算法的蛋白质序列数据由多达20种氨基酸类型的归一化频率和六种疏水氨基酸模式组成。根据这些频率值得出每种蛋白质(全α、全β或α-β类)的结构类预测。由64个先前分类的蛋白质组成的示例被随机分为多个训练集(56个蛋白质)和测试集(8个蛋白质)。在测试集上表现最佳的算法是使用17个输入的学习矢量量化网络,预测准确率为80.2%。所有算法和所有结构类别的马修斯相关系数均具有统计学意义。算法之间的差异一般无统计学意义。这些结果表明,蛋白质一级序列中存在易于获取的信息,这些信息对于通过神经网络以及标准统计聚类算法预测蛋白质结构类别很有用。

相似文献

8
Protein secondary structure prediction with SPARROW.利用 SPARROW 进行蛋白质二级结构预测。
J Chem Inf Model. 2012 Feb 27;52(2):545-56. doi: 10.1021/ci200321u. Epub 2012 Jan 23.

引用本文的文献

3
Characterization of protein secondary structure from NMR chemical shifts.通过核磁共振化学位移表征蛋白质二级结构
Prog Nucl Magn Reson Spectrosc. 2009 Apr 5;54(3-4):141-165. doi: 10.1016/j.pnmrs.2008.06.002.
5
Prediction of protein structural class with Rough Sets.基于粗糙集的蛋白质结构类预测
BMC Bioinformatics. 2006 Jan 14;7:20. doi: 10.1186/1471-2105-7-20.

本文引用的文献

5
Prediction of protein structural class by discriminant analysis.通过判别分析预测蛋白质结构类别。
Biochim Biophys Acta. 1986 Nov 21;874(2):205-15. doi: 10.1016/0167-4838(86)90119-6.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验