Suppr超能文献

为了更好地理解蛋白质二级结构:提取预测规则。

Toward better understanding of protein secondary structure: extracting prediction rules.

机构信息

BioInfomatics Institute, 30 Biopolis Street, #07-01 Matrix, Singapore.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):858-64. doi: 10.1109/TCBB.2010.16.

Abstract

Although numerous computational techniques have been applied to predict protein secondary structure (PSS), only limited studies have dealt with discovery of logic rules underlying the prediction itself. Such rules offer interesting links between the prediction model and the underlying biology. In addition, they enhance interpretability of PSS prediction by providing a degree of transparency to the predicting model usually regarded as a black box. In this paper, we explore the generation and use of C4.5 decision trees to extract relevant rules from PSS predictions modeled with two-stage support vector machines (TS-SVM). The proposed rules were derived on the RS126 data set of 126 nonhomologous globular proteins and on the PSIPRED data set of 1,923 protein sequences. Our approach has produced sets of comprehensible, and often interpretable, rules underlying the PSS predictions. Moreover, many of the rules seem to be strongly supported by biological evidence. Further, our approach resulted in good prediction accuracy, few and usually compact rules, and rules that are generally of higher confidence levels than those generated by other rule extraction techniques.

摘要

虽然已经有许多计算技术被应用于预测蛋白质二级结构(PSS),但只有有限的研究涉及发现预测本身背后的逻辑规则。这些规则在预测模型和基础生物学之间提供了有趣的联系。此外,它们通过为通常被视为黑盒的预测模型提供一定程度的透明度来增强 PSS 预测的可解释性。在本文中,我们探讨了使用 C4.5 决策树从使用两阶段支持向量机(TS-SVM)建模的 PSS 预测中提取相关规则。所提出的规则是基于 RS126 数据集的 126 个非同源球状蛋白质和 PSIPRED 数据集的 1923 个蛋白质序列得出的。我们的方法产生了一组可理解且通常可解释的 PSS 预测背后的规则。此外,许多规则似乎得到了生物学证据的有力支持。此外,我们的方法还实现了良好的预测准确性、规则数量少且通常紧凑、以及规则的置信度水平通常高于其他规则提取技术生成的规则。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验