Suppr超能文献

基于概率性方向上下文的域预测

Domain prediction with probabilistic directional context.

作者信息

Ochoa Alejandro, Singh Mona

机构信息

Lewis-Sigler Institute for Integrative Genomics.

Center for Statistics and Machine Learning.

出版信息

Bioinformatics. 2017 Aug 15;33(16):2471-2478. doi: 10.1093/bioinformatics/btx221.

Abstract

MOTIVATION

Protein domain prediction is one of the most powerful approaches for sequence-based function prediction. Although domain instances are typically predicted independently of each other, newer approaches have demonstrated improved performance by rewarding domain pairs that frequently co-occur within sequences. However, most of these approaches have ignored the order in which domains preferentially co-occur and have also not modeled domain co-occurrence probabilistically.

RESULTS

We introduce a probabilistic approach for domain prediction that models 'directional' domain context. Our method is the first to score all domain pairs within a sequence while taking their order into account, even for non-sequential domains. We show that our approach extends a previous Markov model-based approach to additionally score all pairwise terms, and that it can be interpreted within the context of Markov random fields. We formulate our underlying combinatorial optimization problem as an integer linear program, and demonstrate that it can be solved quickly in practice. Finally, we perform extensive evaluation of domain context methods and demonstrate that incorporating context increases the number of domain predictions by ∼15%, with our approach dPUC2 (Domain Prediction Using Context) outperforming all competing approaches.

AVAILABILITY AND IMPLEMENTATION

dPUC2 is available at http://github.com/alexviiia/dpuc2.

CONTACT

mona@cs.princeton.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质结构域预测是基于序列的功能预测中最强大的方法之一。尽管结构域实例通常是相互独立预测的,但新方法通过奖励序列中频繁共现的结构域对,已证明性能有所提高。然而,这些方法大多忽略了结构域优先共现的顺序,也没有对结构域共现进行概率建模。

结果

我们引入了一种用于结构域预测的概率方法,该方法对“定向”结构域上下文进行建模。我们的方法是第一个在考虑序列中所有结构域对顺序的同时对其进行评分的方法,即使对于非连续结构域也是如此。我们表明,我们的方法扩展了先前基于马尔可夫模型的方法,以额外对所有成对项进行评分,并且可以在马尔可夫随机场的背景下进行解释。我们将潜在的组合优化问题表述为整数线性规划,并证明在实际中可以快速求解。最后,我们对结构域上下文方法进行了广泛评估,并证明纳入上下文可使结构域预测数量增加约15%,我们的方法dPUC2(使用上下文的结构域预测)优于所有竞争方法。

可用性和实现

dPUC2可在http://github.com/alexviiia/dpuc2获取。

联系方式

mona@cs.princeton.edu

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/921b/5870623/eee86535f51a/btx221f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验