Suppr超能文献

MOCASSIN-prot:一种用于蛋白质相似性网络的多目标聚类方法。

MOCASSIN-prot: a multi-objective clustering approach for protein similarity networks.

机构信息

USDA †, ARS, U.S. Meat Animal Research Center, Clay Center, NE 68933, USA.

Department of Mathematics, University of Nebraska-Lincoln, Lincoln, NE 68588, USA.

出版信息

Bioinformatics. 2018 Apr 15;34(8):1270-1277. doi: 10.1093/bioinformatics/btx755.

Abstract

MOTIVATION

Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary history of proteins is hence best modeled through networks that incorporate information both from the sequence divergence and the domain content. Here, a game-theoretic approach proposed for protein network construction is adapted into the framework of multi-objective optimization, and extended to incorporate clustering refinement procedure.

RESULTS

The new method, MOCASSIN-prot, was applied to cluster multi-domain proteins from ten genomes. The performance of MOCASSIN-prot was compared against two protein clustering methods, Markov clustering (TRIBE-MCL) and spectral clustering (SCPS). We showed that compared to these two methods, MOCASSIN-prot, which uses both domain composition and quantitative sequence similarity information, generates fewer false positives. It achieves more functionally coherent protein clusters and better differentiates protein families.

AVAILABILITY AND IMPLEMENTATION

MOCASSIN-prot, implemented in Perl and Matlab, is freely available at http://bioinfolab.unl.edu/emlab/MOCASSINprot.

CONTACT

emoriyama2@unl.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质通常包含多个保守结构域。各种进化事件,包括结构域的复制和丢失、结构域改组以及序列分歧,导致蛋白质结构及其功能的复杂性。因此,蛋白质的进化历史最好通过整合序列分歧和结构域内容信息的网络进行建模。在这里,我们将一种针对蛋白质网络构建的博弈论方法改编为多目标优化框架,并扩展到包含聚类细化过程。

结果

新方法 MOCASSIN-prot 被应用于从十个基因组中聚类多结构域蛋白质。将 MOCASSIN-prot 的性能与两种蛋白质聚类方法(Markov 聚类(TRIBE-MCL)和谱聚类(SCPS))进行了比较。与这两种方法相比,MOCASSIN-prot 同时使用结构域组成和定量序列相似性信息,产生的假阳性更少。它能够生成更具功能一致性的蛋白质聚类,并更好地区分蛋白质家族。

可用性和实现

MOCASSIN-prot 是用 Perl 和 Matlab 实现的,可在 http://bioinfolab.unl.edu/emlab/MOCASSINprot 上免费获得。

联系信息

emoriyama2@unl.edu

补充信息

补充数据可在 Bioinformatics 在线获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验