Suppr超能文献

基于种子传播采样和留一法检验的蛋白质折叠类型预测分析

An analysis of protein folding type prediction by seed-propagated sampling and jackknife test.

作者信息

Zhang C T, Chou K C

机构信息

Department of Physics, Tianjin University, China.

出版信息

J Protein Chem. 1995 Oct;14(7):583-93. doi: 10.1007/BF01886884.

Abstract

In the development of methodology for statistical prediction of protein folding types, how to test the predicted results is a crucial problem. In addition to the resubstitution test in which the folding type of each protein from a training set is predicted based on the rules derived from the same set, cross-validation tests are needed. Among them, the single-test-set method seems to be least reliable due to the arbitrariness in selecting the test set. Although the leaving-one-out (or jackknife) test is more objective and hence more reliable, it may cause a severe information loss by leaving a protein in turn out of the training set when its size is not large enough. In order to overcome the above drawback, a seed-propagated sampling approach is proposed that can be used to generate any number of simulated proteins with a desired type based on a given training set database. There is no need to make any predetermined assumption about the statistical distribution function of the amino acid frequencies. Combined with the existing cross-validation methods, the new technique may provide a more objective estimation for various protein-folding-type prediction methods.

摘要

在蛋白质折叠类型统计预测方法的发展过程中,如何检验预测结果是一个关键问题。除了基于从训练集中得出的规则对训练集中每个蛋白质的折叠类型进行预测的重代入检验外,还需要交叉验证检验。其中,单测试集方法由于在选择测试集时存在任意性,似乎是最不可靠的。虽然留一法(或刀切法)检验更客观,因此更可靠,但当训练集规模不够大时,每次将一个蛋白质排除在训练集之外可能会导致严重的信息损失。为了克服上述缺点,提出了一种种子传播抽样方法,该方法可用于基于给定的训练集数据库生成任意数量具有所需类型的模拟蛋白质。无需对氨基酸频率的统计分布函数做出任何预先假设。结合现有的交叉验证方法,新技术可能为各种蛋白质折叠类型预测方法提供更客观的估计。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验