Suppr超能文献

使用混合序列表示进行无规则内容的计算预测。

In-silico prediction of disorder content using hybrid sequence representation.

机构信息

Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2V4, Canada.

出版信息

BMC Bioinformatics. 2011 Jun 17;12:245. doi: 10.1186/1471-2105-12-245.

Abstract

BACKGROUND

Intrinsically disordered proteins play important roles in various cellular activities and their prevalence was implicated in a number of human diseases. The knowledge of the content of the intrinsic disorder in proteins is useful for a variety of studies including estimation of the abundance of disorder in protein families, classes, and complete proteomes, and for the analysis of disorder-related protein functions. The above investigations currently utilize the disorder content derived from the per-residue disorder predictions. We show that these predictions may over-or under-predict the overall amount of disorder, which motivates development of novel tools for direct and accurate sequence-based prediction of the disorder content.

RESULTS

We hypothesize that sequence-level aggregation of input information may provide more accurate content prediction when compared with the content extracted from the local window-based residue-level disorder predictors. We propose a novel predictor, DisCon, that takes advantage of a small set of 29 custom-designed descriptors that aggregate and hybridize information concerning sequence, evolutionary profiles, and predicted secondary structure, solvent accessibility, flexibility, and annotation of globular domains. Using these descriptors and a ridge regression model, DisCon predicts the content with low, 0.05, mean squared error and high, 0.68, Pearson correlation. This is a statistically significant improvement over the content computed from outputs of ten modern disorder predictors on a test dataset with proteins that share low sequence identity with the training sequences. The proposed predictive model is analyzed to discuss factors related to the prediction of the disorder content.

CONCLUSIONS

DisCon is a high-quality alternative for high-throughput annotation of the disorder content. We also empirically demonstrate that the DisCon's predictions can be used to improve binary annotations of the disordered residues from the real-value disorder propensities generated by current residue-level disorder predictors. The web server that implements the DisCon is available at http://biomine.ece.ualberta.ca/DisCon/.

摘要

背景

无序蛋白质在各种细胞活动中发挥着重要作用,其普遍性与许多人类疾病有关。了解蛋白质中固有无序的含量对于各种研究是有用的,包括估计无序在蛋白质家族、类别和完整蛋白质组中的丰度,以及分析与无序相关的蛋白质功能。目前,这些研究利用基于残基的无序预测来获得无序含量。我们表明,这些预测可能会过度或低估整体无序程度,这促使我们开发新的工具来直接、准确地预测无序含量。

结果

我们假设与从基于局部窗口的残基无序预测器中提取的内容相比,输入信息的序列级聚集可能提供更准确的内容预测。我们提出了一种新的预测器 DisCon,它利用了一小部分 29 个定制设计的描述符,这些描述符聚合并混合了有关序列、进化谱以及预测的二级结构、溶剂可及性、灵活性和球状结构域注释的信息。使用这些描述符和脊回归模型,DisCon 以低的 0.05 均方误差和高的 0.68 皮尔逊相关系数预测内容。与使用与训练序列低序列同一性的蛋白质的测试数据集上的十种现代无序预测器的输出计算的内容相比,这是一个统计学上的显著改进。对提出的预测模型进行了分析,以讨论与无序内容预测相关的因素。

结论

DisCon 是高通量无序内容注释的高质量替代方案。我们还通过经验证明,DisCon 的预测可以用于改善当前基于残基的无序预测器生成的真实无序倾向的无序残基的二进制注释。实现 DisCon 的网络服务器可在 http://biomine.ece.ualberta.ca/DisCon/ 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c3d/3212983/f27cdbb7816c/1471-2105-12-245-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验