Suppr超能文献

联合概率逻辑对多个蛋白质特征预测器进行细化。

Joint probabilistic-logical refinement of multiple protein feature predictors.

机构信息

Department of Information Engineering and Computer Science, Università degli Studi di Trento, Trento, Italy.

出版信息

BMC Bioinformatics. 2014 Jan 15;15:16. doi: 10.1186/1471-2105-15-16.

Abstract

BACKGROUND

Computational methods for the prediction of protein features from sequence are a long-standing focus of bioinformatics. A key observation is that several protein features are closely inter-related, that is, they are conditioned on each other. Researchers invested a lot of effort into designing predictors that exploit this fact. Most existing methods leverage inter-feature constraints by including known (or predicted) correlated features as inputs to the predictor, thus conditioning the result.

RESULTS

By including correlated features as inputs, existing methods only rely on one side of the relation: the output feature is conditioned on the known input features. Here we show how to jointly improve the outputs of multiple correlated predictors by means of a probabilistic-logical consistency layer. The logical layer enforces a set of weighted first-order rules encoding biological constraints between the features, and improves the raw predictions so that they least violate the constraints. In particular, we show how to integrate three stand-alone predictors of correlated features: subcellular localization (Loctree [J Mol Biol 348:85-100, 2005]), disulfide bonding state (Disulfind [Nucleic Acids Res 34:W177-W181, 2006]), and metal bonding state (MetalDetector [Bioinformatics 24:2094-2095, 2008]), in a way that takes into account the respective strengths and weaknesses, and does not require any change to the predictors themselves. We also compare our methodology against two alternative refinement pipelines based on state-of-the-art sequential prediction methods.

CONCLUSIONS

The proposed framework is able to improve the performance of the underlying predictors by removing rule violations. We show that different predictors offer complementary advantages, and our method is able to integrate them using non-trivial constraints, generating more consistent predictions. In addition, our framework is fully general, and could in principle be applied to a vast array of heterogeneous predictions without requiring any change to the underlying software. On the other hand, the alternative strategies are more specific and tend to favor one task at the expense of the others, as shown by our experimental evaluation. The ultimate goal of our framework is to seamlessly integrate full prediction suites, such as Distill [BMC Bioinformatics 7:402, 2006] and PredictProtein [Nucleic Acids Res 32:W321-W326, 2004].

摘要

背景

从序列预测蛋白质特征的计算方法是生物信息学的一个长期研究焦点。一个关键的观察结果是,一些蛋白质特征密切相关,也就是说,它们是相互条件的。研究人员投入了大量精力来设计利用这一事实的预测器。大多数现有的方法通过将已知(或预测)相关特征作为输入纳入预测器来利用特征之间的相互约束,从而对结果进行条件处理。

结果

通过将相关特征作为输入,现有的方法仅依赖于关系的一侧:输出特征受到已知输入特征的条件约束。在这里,我们展示了如何通过概率逻辑一致性层来联合改进多个相关预测器的输出。逻辑层强制执行一组编码特征之间生物约束的加权一阶规则,并改进原始预测,使它们最少违反约束。具体来说,我们展示了如何整合三个独立的相关特征预测器:亚细胞定位(Loctree [J Mol Biol 348:85-100, 2005])、二硫键状态(Disulfind [Nucleic Acids Res 34:W177-W181, 2006])和金属键合状态(MetalDetector [Bioinformatics 24:2094-2095, 2008]),以考虑到各自的优缺点,并且不需要对预测器本身进行任何更改。我们还将我们的方法与基于最先进的序列预测方法的两种替代精炼管道进行了比较。

结论

所提出的框架能够通过消除规则违反来提高基础预测器的性能。我们表明,不同的预测器具有互补优势,并且我们的方法能够使用非平凡的约束来整合它们,生成更一致的预测。此外,我们的框架是完全通用的,原则上可以应用于大量异构预测,而无需对底层软件进行任何更改。另一方面,替代策略更具体,倾向于牺牲其他任务来优先考虑一个任务,正如我们的实验评估所示。我们的框架的最终目标是无缝集成完整的预测套件,例如 Distill [BMC Bioinformatics 7:402, 2006] 和 PredictProtein [Nucleic Acids Res 32:W321-W326, 2004]。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验