为了更好地理解蛋白质二级结构：提取预测规则。

Toward better understanding of protein secondary structure: extracting prediction rules.

机构信息

BioInfomatics Institute, 30 Biopolis Street, #07-01 Matrix, Singapore.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):858-64. doi: 10.1109/TCBB.2010.16.

DOI:10.1109/TCBB.2010.16

Abstract

Although numerous computational techniques have been applied to predict protein secondary structure (PSS), only limited studies have dealt with discovery of logic rules underlying the prediction itself. Such rules offer interesting links between the prediction model and the underlying biology. In addition, they enhance interpretability of PSS prediction by providing a degree of transparency to the predicting model usually regarded as a black box. In this paper, we explore the generation and use of C4.5 decision trees to extract relevant rules from PSS predictions modeled with two-stage support vector machines (TS-SVM). The proposed rules were derived on the RS126 data set of 126 nonhomologous globular proteins and on the PSIPRED data set of 1,923 protein sequences. Our approach has produced sets of comprehensible, and often interpretable, rules underlying the PSS predictions. Moreover, many of the rules seem to be strongly supported by biological evidence. Further, our approach resulted in good prediction accuracy, few and usually compact rules, and rules that are generally of higher confidence levels than those generated by other rule extraction techniques.

摘要

虽然已经有许多计算技术被应用于预测蛋白质二级结构（PSS），但只有有限的研究涉及发现预测本身背后的逻辑规则。这些规则在预测模型和基础生物学之间提供了有趣的联系。此外，它们通过为通常被视为黑盒的预测模型提供一定程度的透明度来增强 PSS 预测的可解释性。在本文中，我们探讨了使用 C4.5 决策树从使用两阶段支持向量机（TS-SVM）建模的 PSS 预测中提取相关规则。所提出的规则是基于 RS126 数据集的 126 个非同源球状蛋白质和 PSIPRED 数据集的 1923 个蛋白质序列得出的。我们的方法产生了一组可理解且通常可解释的 PSS 预测背后的规则。此外，许多规则似乎得到了生物学证据的有力支持。此外，我们的方法还实现了良好的预测准确性、规则数量少且通常紧凑、以及规则的置信度水平通常高于其他规则提取技术生成的规则。

相似文献

Toward better understanding of protein secondary structure: extracting prediction rules.为了更好地理解蛋白质二级结构：提取预测规则。

IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):858-64. doi: 10.1109/TCBB.2010.16.

Rule generation for protein secondary structure prediction with support vector machines and decision tree.使用支持向量机和决策树进行蛋白质二级结构预测的规则生成

IEEE Trans Nanobioscience. 2006 Mar;5(1):46-53. doi: 10.1109/tnb.2005.864021.

Two-stage multi-class support vector machines to protein secondary structure prediction.用于蛋白质二级结构预测的两阶段多类支持向量机

Pac Symp Biocomput. 2005:346-57. doi: 10.1142/9789812702456_0033.

Prediction of Protein Secondary Structure with two-stage multi-class SVMs.基于两阶段多分类支持向量机的蛋白质二级结构预测

Int J Data Min Bioinform. 2007;1(3):248-69. doi: 10.1504/ijdmb.2007.011612.

Knowledge acquisition and development of accurate rules for predicting protein stability changes.知识获取以及用于预测蛋白质稳定性变化的精确规则的开发。

Comput Biol Chem. 2006 Dec;30(6):408-15. doi: 10.1016/j.compbiolchem.2006.06.004. Epub 2006 Sep 26.

A high-accuracy protein structural class prediction algorithm using predicted secondary structural information.利用预测的二级结构信息进行高精度蛋白质结构类预测算法。

J Theor Biol. 2010 Dec 7;267(3):272-5. doi: 10.1016/j.jtbi.2010.09.007. Epub 2010 Sep 8.

Predicting the state of cysteines based on sequence information.基于序列信息预测半胱氨酸状态。

J Theor Biol. 2010 Dec 7;267(3):312-8. doi: 10.1016/j.jtbi.2010.09.002. Epub 2010 Sep 6.

Two multi-classification strategies used on SVM to predict protein structural classes by using auto covariance.两种使用自协方差的 SVM 多分类策略用于预测蛋白质结构类别。

Interdiscip Sci. 2009 Dec;1(4):315-9. doi: 10.1007/s12539-009-0066-1. Epub 2009 Nov 14.

Predicting protein secondary structure by a support vector machine based on a new coding scheme.基于一种新编码方案的支持向量机预测蛋白质二级结构

Genome Inform. 2004;15(2):181-90.

Improving protein secondary structure prediction using a multi-modal BP method.利用多模态 BP 方法改进蛋白质二级结构预测。

Comput Biol Med. 2011 Oct;41(10):946-59. doi: 10.1016/j.compbiomed.2011.08.005. Epub 2011 Aug 30.

引用本文的文献

DeepACLSTM: deep asymmetric convolutional long short-term memory neural models for protein secondary structure prediction.DeepACLSTM：用于蛋白质二级结构预测的深度非对称卷积长短时记忆神经模型。

BMC Bioinformatics. 2019 Jun 17;20(1):341. doi: 10.1186/s12859-019-2940-0.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

为了更好地理解蛋白质二级结构：提取预测规则。

Toward better understanding of protein secondary structure: extracting prediction rules.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献