Suppr超能文献

CMsearch:同时探索蛋白质序列空间和结构空间不仅能改善蛋白质同源性检测,还能提升蛋白质结构预测。

CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.

作者信息

Cui Xuefeng, Lu Zhiwu, Wang Sheng, Jing-Yan Wang Jim, Gao Xin

机构信息

King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955-6900, Saudi Arabia.

Beijing Key Laboratory of Big Data Management and Analysis Methods, School of Information, Renmin University of China, Beijing 100872, China.

出版信息

Bioinformatics. 2016 Jun 15;32(12):i332-i340. doi: 10.1093/bioinformatics/btw271.

Abstract

MOTIVATION

Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information.

METHOD

We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence-structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration.

RESULTS

We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM-HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods.

AVAILABILITY AND IMPLEMENTATION

Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx

CONTACT

: xin.gao@kaust.edu.sa

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质同源性检测是计算生物学中的一个基本问题,是预测蛋白质结构和理解蛋白质功能不可或缺的一步。尽管近几十年来在序列比对、穿线法和无比对方法方面取得了进展,但蛋白质同源性检测仍然是一个具有挑战性的开放问题。最近,试图在蛋白质结构空间中寻找传递路径的网络方法证明了纳入结构空间网络信息的重要性。然而,当前的方法将序列空间和结构空间合并为一个单一空间,因此在组合不同信息源时引入了不一致性。

方法

我们提出了一种基于跨模态学习的新型基于网络的蛋白质同源性检测方法CMsearch。CMsearch不是探索由序列和结构空间信息混合构建的单个网络,而是构建两个单独的网络来表示序列空间和结构空间。然后,它通过同时考虑序列信息、结构信息、序列空间信息和结构空间信息来学习序列-结构相关性。

结果

我们通过查询所有8332个PDB40蛋白质,在蛋白质同源性检测和蛋白质结构预测这两个具有挑战性的任务上测试了CMsearch。我们的结果表明,CMsearch对用于定义序列和结构空间的相似性度量不敏感。通过使用HMM-HMM比对作为序列相似性度量,CMsearch明显优于现有的同源性检测方法和基于模板的蛋白质结构预测方法(这些方法在蛋白质结构预测竞赛CASP中获胜)。

可用性和实现

我们的程序可从http://sfb.kaust.edu.sa/Pages/Software.aspx免费下载。

联系方式

xin.gao@kaust.edu.sa

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/84d4/4908355/f14af9618728/btw271f1p.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验