Suppr超能文献

预测N端、C端和内部区域的蛋白质无序状态。

Predicting Protein Disorder for N-, C-, and Internal Regions.

作者信息

Li X, Romero P, Rani M, Dunker AK, Obradovic Z

出版信息

Genome Inform Ser Workshop Genome Inform. 1999;10:30-40.

Abstract

Logistic regression (LR), discriminant analysis (DA), and neural networks (NN) were used to predict ordered and disordered regions in proteins. Training data were from a set of non-redundant X-ray crystal structures, with the data being partitioned into N-terminal, C-terminal and internal (I) regions. The DA and LR methods gave almost identical 5-cross validation accuracies that averaged to the following values: 75.9 +/- 3.1% (N-regions), 70.7 +/- 1.5% (I-regions), and 74.6 +/- 4.4% (C-regions). NN predictions gave slightly higher scores: 78.8 +/- 1.2% (N-regions), 72.5 +/- 1.2% (I-regions), and 75.3 +/- 3.3% (C-regions). Predictions improved with length of the disordered regions. Averaged over the three methods, values ranged from 52% to 78% for length = 9-14 to >/= 21, respectively, for I-regions, from 72% to 81% for length = 5 to 12-15, respectively, for N-regions, and from 70% to 80% for length = 5 to 12-15, respectively, for C-regions. These data support the hypothesis that disorder is encoded by the amino acid sequence.

摘要

逻辑回归(LR)、判别分析(DA)和神经网络(NN)被用于预测蛋白质中的有序和无序区域。训练数据来自一组非冗余的X射线晶体结构,数据被划分为N端、C端和内部(I)区域。DA和LR方法给出了几乎相同的5折交叉验证准确率,平均如下:75.9 +/- 3.1%(N区域)、70.7 +/- 1.5%(I区域)和74.6 +/- 4.4%(C区域)。NN预测给出了略高的分数:78.8 +/- 1.2%(N区域)、72.5 +/- 1.2%(I区域)和75.3 +/- 3.3%(C区域)。预测随着无序区域长度的增加而提高。对于I区域,三种方法的平均值在长度为9 - 14时为52%,在长度≥21时为78%;对于N区域,在长度为5时为72%,在长度为12 - 15时为81%;对于C区域,在长度为5时为70%,在长度为12 - 15时为80%。这些数据支持了无序由氨基酸序列编码的假设。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验