Suppr超能文献

蛋白质内在无序性的长度依赖性预测。

Length-dependent prediction of protein intrinsic disorder.

作者信息

Peng Kang, Radivojac Predrag, Vucetic Slobodan, Dunker A Keith, Obradovic Zoran

机构信息

Center for Information Science and Technology, Temple University, Philadelphia, PA 19122, USA.

出版信息

BMC Bioinformatics. 2006 Apr 17;7:208. doi: 10.1186/1471-2105-7-208.

Abstract

BACKGROUND

Due to the functional importance of intrinsically disordered proteins or protein regions, prediction of intrinsic protein disorder from amino acid sequence has become an area of active research as witnessed in the 6th experiment on Critical Assessment of Techniques for Protein Structure Prediction (CASP6). Since the initial work by Romero et al. (Identifying disordered regions in proteins from amino acid sequences, IEEE Int. Conf. Neural Netw., 1997), our group has developed several predictors optimized for long disordered regions (>30 residues) with prediction accuracy exceeding 85%. However, these predictors are less successful on short disordered regions (< or =30 residues). A probable cause is a length-dependent amino acid compositions and sequence properties of disordered regions.

RESULTS

We proposed two new predictor models, VSL2-M1 and VSL2-M2, to address this length-dependency problem in prediction of intrinsic protein disorder. These two predictors are similar to the original VSL1 predictor used in the CASP6 experiment. In both models, two specialized predictors were first built and optimized for short (< or = 30 residues) and long disordered regions (>30 residues), respectively. A meta predictor was then trained to integrate the specialized predictors into the final predictor model. As the 10-fold cross-validation results showed, the VSL2 predictors achieved well-balanced prediction accuracies of 81% on both short and long disordered regions. Comparisons over the VSL2 training dataset via 10-fold cross-validation and a blind-test set of unrelated recent PDB chains indicated that VSL2 predictors were significantly more accurate than several existing predictors of intrinsic protein disorder.

CONCLUSION

The VSL2 predictors are applicable to disordered regions of any length and can accurately identify the short disordered regions that are often misclassified by our previous disorder predictors. The success of the VSL2 predictors further confirmed the previously observed differences in amino acid compositions and sequence properties between short and long disordered regions, and justified our approaches for modelling short and long disordered regions separately. The VSL2 predictors are freely accessible for non-commercial use at http://www.ist.temple.edu/disprot/predictorVSL2.php.

摘要

背景

由于内在无序蛋白质或蛋白质区域具有重要的功能,从氨基酸序列预测蛋白质内在无序已成为一个活跃的研究领域,如第六届蛋白质结构预测技术关键评估实验(CASP6)所示。自罗梅罗等人的最初工作(从氨基酸序列识别蛋白质中的无序区域,IEEE国际神经网络会议,1997年)以来,我们团队已开发出几种针对长无序区域(>30个残基)进行优化的预测器,预测准确率超过85%。然而,这些预测器在短无序区域(≤30个残基)上的效果较差。一个可能的原因是无序区域的氨基酸组成和序列特性与长度有关。

结果

我们提出了两种新的预测器模型,VSL2-M1和VSL2-M2,以解决蛋白质内在无序预测中的这种长度依赖性问题。这两种预测器与CASP6实验中使用的原始VSL1预测器相似。在这两种模型中,首先分别针对短(≤30个残基)和长无序区域(>30个残基)构建并优化了两个专门的预测器。然后训练一个元预测器,将这些专门的预测器整合到最终的预测器模型中。如10折交叉验证结果所示,VSL2预测器在短和长无序区域上均取得了81%的平衡预测准确率。通过10折交叉验证对VSL2训练数据集以及一组近期不相关的PDB链的盲测集进行比较表明,VSL2预测器比几种现有的蛋白质内在无序预测器显著更准确。

结论

VSL2预测器适用于任何长度的无序区域,能够准确识别那些经常被我们之前的无序预测器误分类的短无序区域。VSL2预测器的成功进一步证实了之前观察到的短和长无序区域在氨基酸组成和序列特性上的差异,并证明了我们分别对短和长无序区域进行建模的方法的合理性。VSL2预测器可在http://www.ist.temple.edu/disprot/predictorVSL2.php上免费供非商业使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47cb/1479845/68296f620fcb/1471-2105-7-208-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验