Suppr超能文献

PCI-SS:MISO动态非线性蛋白质二级结构预测

PCI-SS: MISO dynamic nonlinear protein secondary structure prediction.

作者信息

Green James R, Korenberg Michael J, Aboul-Magd Mohammed O

机构信息

Department of Systems and Computer Engineering, Carleton University, Ottawa, Ontario, Canada.

出版信息

BMC Bioinformatics. 2009 Jul 17;10:222. doi: 10.1186/1471-2105-10-222.

Abstract

BACKGROUND

Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing alpha-helices, beta-strands, and non-regular structures) from primary sequence data which makes use of Parallel Cascade Identification (PCI), a powerful technique from the field of nonlinear system identification.

RESULTS

Using PSI-BLAST divergent evolutionary profiles as input data, dynamic nonlinear systems are built through a black-box approach to model the process of protein folding. Genetic algorithms (GAs) are applied in order to optimize the architectural parameters of the PCI models. The three-state prediction problem is broken down into a combination of three binary sub-problems and protein structure classifiers are built using 2 layers of PCI classifiers. Careful construction of the optimization, training, and test datasets ensures that no homology exists between any training and testing data. A detailed comparison between PCI and 9 contemporary methods is provided over a set of 125 new protein chains guaranteed to be dissimilar to all training data. Unlike other secondary structure prediction methods, here a web service is developed to provide both human- and machine-readable interfaces to PCI-based protein secondary structure prediction. This server, called PCI-SS, is available at http://bioinf.sce.carleton.ca/PCISS. In addition to a dynamic PHP-generated web interface for humans, a Simple Object Access Protocol (SOAP) interface is added to permit invocation of the PCI-SS service remotely. This machine-readable interface facilitates incorporation of PCI-SS into multi-faceted systems biology analysis pipelines requiring protein secondary structure information, and greatly simplifies high-throughput analyses. XML is used to represent the input protein sequence data and also to encode the resulting structure prediction in a machine-readable format. To our knowledge, this represents the only publicly available SOAP-interface for a protein secondary structure prediction service with published WSDL interface definition.

CONCLUSION

Relative to the 9 contemporary methods included in the comparison cascaded PCI classifiers perform well, however PCI finds greatest application as a consensus classifier. When PCI is used to combine a sequence-to-structure PCI-based classifier with the current leading ANN-based method, PSIPRED, the overall error rate (Q3) is maintained while the rate of occurrence of a particularly detrimental error is reduced by up to 25%. This improvement in BAD score, combined with the machine-readable SOAP web service interface makes PCI-SS particularly useful for inclusion in a tertiary structure prediction pipeline.

摘要

背景

由于蛋白质的功能很大程度上由其三维结构决定,确定蛋白质的结构对生物学至关重要。在此,我们报告一种从一级序列数据确定蛋白质一维二级结构(区分α螺旋、β链和非规则结构)的新方法,该方法利用了并行级联识别(PCI),这是一种来自非线性系统识别领域的强大技术。

结果

使用PSI-BLAST差异进化谱作为输入数据,通过黑箱方法构建动态非线性系统以模拟蛋白质折叠过程。应用遗传算法(GA)来优化PCI模型的架构参数。将三态预测问题分解为三个二元子问题的组合,并使用两层PCI分类器构建蛋白质结构分类器。精心构建优化、训练和测试数据集可确保任何训练数据和测试数据之间不存在同源性。在一组保证与所有训练数据不同的125条新蛋白质链上,对PCI与9种当代方法进行了详细比较。与其他二级结构预测方法不同,这里开发了一个网络服务,为基于PCI的蛋白质二级结构预测提供人类可读和机器可读的接口。这个名为PCI-SS的服务器可在http://bioinf.sce.carleton.ca/PCISS上获取。除了为人类生成的动态PHP网络界面外,还添加了一个简单对象访问协议(SOAP)接口,以允许远程调用PCI-SS服务。这个机器可读接口便于将PCI-SS纳入需要蛋白质二级结构信息的多方面系统生物学分析流程,并极大地简化了高通量分析。XML用于表示输入的蛋白质序列数据,并以机器可读格式编码所得的结构预测。据我们所知,这是唯一具有已发布WSDL接口定义的蛋白质二级结构预测服务的公开可用SOAP接口。

结论

相对于比较中包含的9种当代方法,级联PCI分类器表现良好,然而PCI作为一种一致性分类器有最大的应用价值。当使用PCI将基于序列到结构的PCI分类器与当前领先的基于人工神经网络的方法PSIPRED相结合时,总体错误率(Q3)保持不变,而特别有害错误的发生率降低了多达25%。BAD分数的这种提高,再加上机器可读的SOAP网络服务接口,使得PCI-SS对于纳入三级结构预测流程特别有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c7/2720391/7967a1a93d20/1471-2105-10-222-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验