CMsearch：同时探索蛋白质序列空间和结构空间不仅能改善蛋白质同源性检测，还能提升蛋白质结构预测。

CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.

作者信息

Cui Xuefeng, Lu Zhiwu, Wang Sheng, Jing-Yan Wang Jim, Gao Xin

机构信息

King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955-6900, Saudi Arabia.

Beijing Key Laboratory of Big Data Management and Analysis Methods, School of Information, Renmin University of China, Beijing 100872, China.

出版信息

Bioinformatics. 2016 Jun 15;32(12):i332-i340. doi: 10.1093/bioinformatics/btw271.

DOI:10.1093/bioinformatics/btw271

PMID:27307635

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4908355/

Abstract

MOTIVATION

Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information.

METHOD

We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence-structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration.

RESULTS

We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM-HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods.

AVAILABILITY AND IMPLEMENTATION

Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx

CONTACT

: xin.gao@kaust.edu.sa

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质同源性检测是计算生物学中的一个基本问题，是预测蛋白质结构和理解蛋白质功能不可或缺的一步。尽管近几十年来在序列比对、穿线法和无比对方法方面取得了进展，但蛋白质同源性检测仍然是一个具有挑战性的开放问题。最近，试图在蛋白质结构空间中寻找传递路径的网络方法证明了纳入结构空间网络信息的重要性。然而，当前的方法将序列空间和结构空间合并为一个单一空间，因此在组合不同信息源时引入了不一致性。

方法

我们提出了一种基于跨模态学习的新型基于网络的蛋白质同源性检测方法CMsearch。CMsearch不是探索由序列和结构空间信息混合构建的单个网络，而是构建两个单独的网络来表示序列空间和结构空间。然后，它通过同时考虑序列信息、结构信息、序列空间信息和结构空间信息来学习序列-结构相关性。

结果

我们通过查询所有8332个PDB40蛋白质，在蛋白质同源性检测和蛋白质结构预测这两个具有挑战性的任务上测试了CMsearch。我们的结果表明，CMsearch对用于定义序列和结构空间的相似性度量不敏感。通过使用HMM-HMM比对作为序列相似性度量，CMsearch明显优于现有的同源性检测方法和基于模板的蛋白质结构预测方法（这些方法在蛋白质结构预测竞赛CASP中获胜）。

可用性和实现

我们的程序可从http://sfb.kaust.edu.sa/Pages/Software.aspx免费下载。

联系方式

xin.gao@kaust.edu.sa

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/84d4/4908355/f14af9618728/btw271f1p.jpg

相似文献

CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.CMsearch：同时探索蛋白质序列空间和结构空间不仅能改善蛋白质同源性检测，还能提升蛋白质结构预测。

Bioinformatics. 2016 Jun 15;32(12):i332-i340. doi: 10.1093/bioinformatics/btw271.

Incorporating homologues into sequence embeddings for protein analysis.将同源物纳入用于蛋白质分析的序列嵌入中。

J Bioinform Comput Biol. 2007 Jun;5(3):717-38. doi: 10.1142/s0219720007002734.

Protein threading using residue co-variation and deep learning.使用残基共变和深度学习进行蛋白质穿线。

Bioinformatics. 2018 Jul 1;34(13):i263-i273. doi: 10.1093/bioinformatics/bty278.

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法：一种用于判别式多类别蛋白质折叠和超家族识别的工具。

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

MRFalign: protein homology detection through alignment of Markov random fields.MRFalign：通过马尔可夫随机场比对进行蛋白质同源性检测。

PLoS Comput Biol. 2014 Mar 27;10(3):e1003500. doi: 10.1371/journal.pcbi.1003500. eCollection 2014 Mar.

FALCON@home: a high-throughput protein structure prediction server based on remote homologue recognition.FALCON@home：一个基于远程同源物识别的高通量蛋白质结构预测服务器。

Bioinformatics. 2016 Feb 1;32(3):462-4. doi: 10.1093/bioinformatics/btv581. Epub 2015 Oct 10.

Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases.用于蛋白质同源物的迭代序列/二级结构搜索：与氨基酸序列比对的比较及在基因组数据库中折叠识别的应用

Bioinformatics. 2000 Nov;16(11):988-1002. doi: 10.1093/bioinformatics/16.11.988.

Protein domain recurrence and order can enhance prediction of protein functions.蛋白质结构域的重复和顺序可以增强对蛋白质功能的预测。

Bioinformatics. 2012 Sep 15;28(18):i444-i450. doi: 10.1093/bioinformatics/bts398.

Fuse: multiple network alignment via data fusion.Fuse：通过数据融合进行多重网络比对。

Bioinformatics. 2016 Apr 15;32(8):1195-203. doi: 10.1093/bioinformatics/btv731. Epub 2015 Dec 14.

DEEPre: sequence-based enzyme EC number prediction by deep learning.DEEPre：基于深度学习的酶 EC 号序列预测。

Bioinformatics. 2018 Mar 1;34(5):760-769. doi: 10.1093/bioinformatics/btx680.

引用本文的文献

The Complete Mitochondrial Genome of (Teleostei: Siluriformes: Amblycipitidae): Characterization, Phylogenetic Placement, and Insights into Genetic Diversity.（硬骨鱼纲：鲇形目：钝头鮠科）的线粒体全基因组：特征、系统发育定位及遗传多样性见解

Genes (Basel). 2025 Aug 19;16(8):977. doi: 10.3390/genes16080977.

A chromosome-level genome assembly of the Hispid cotton rat (Sigmodon hispidus), a model for human pathogenic virus infections.棉鼠（Sigmodon hispidus）的染色体水平基因组组装，棉鼠是人类致病病毒感染的模型。

BMC Biol. 2025 Jul 18;23(1):217. doi: 10.1186/s12915-025-02316-6.

Chromosome-level genome assembly of Cheilinus chlorourus (Bloch, 1791) (Perciformes: Labridae).绿唇鱼（Cheilinus chlorourus）（布洛赫，1791年）（鲈形目：隆头鱼科）的染色体水平基因组组装

Sci Data. 2025 Jul 2;12(1):1133. doi: 10.1038/s41597-025-05288-y.

The genome sequence of the Violet Copper, (Denis & Schiffermüller, 1775).紫铜弄蝶（丹尼斯和席费尔米勒，1775年）的基因组序列。

F1000Res. 2025 Jan 10;14:60. doi: 10.12688/f1000research.156485.1. eCollection 2025.

Screening a new European hake (Merluccius merluccius) chromosome-level genome assembly suggests an XX/XY sex-determining system driven by the SRY-box transcription factor 3 (sox3).对一种新的欧洲无须鳕（Merluccius merluccius）染色体水平的基因组组装进行筛选表明，其性别决定系统为XX/XY型，由SRY盒转录因子3（sox3）驱动。

G3 (Bethesda). 2025 Aug 6;15(8). doi: 10.1093/g3journal/jkaf127.

A chromosome-level genome assembly of Ficus benjamina, a fig tree with great ecological and ornamental value.垂叶榕的染色体水平基因组组装，垂叶榕是一种具有重要生态和观赏价值的无花果树。

Sci Data. 2025 May 20;12(1):824. doi: 10.1038/s41597-025-05155-w.

Comparative genomics-based insights into strains, isolated from white spot diseased leaves of maize with plant growth-promoting attributes.基于比较基因组学对从具有促进植物生长特性的玉米白斑病叶片中分离出的菌株的见解。

Appl Environ Microbiol. 2025 Jun 18;91(6):e0032925. doi: 10.1128/aem.00329-25. Epub 2025 May 19.

Characterization of five complete mitochondrial genomes of the genus Simulium (Diptera: Simuliidae) and their phylogenetic implications.蚋属（双翅目：蚋科）五个完整线粒体基因组的特征分析及其系统发育意义

Genetica. 2025 May 17;153(1):20. doi: 10.1007/s10709-025-00237-4.

A high-quality chromosome-level genome assembly of Antiaris toxicaria.见血封喉的高质量染色体水平基因组组装。

BMC Genom Data. 2025 Mar 24;26(1):21. doi: 10.1186/s12863-025-01309-2.

The zebrafish () snoRNAome.斑马鱼（）的小核仁RNA组。

NAR Genom Bioinform. 2025 Mar 5;7(1):lqaf013. doi: 10.1093/nargab/lqaf013. eCollection 2025 Mar.

本文引用的文献

Improving Protein Fold Recognition by Deep Learning Networks.通过深度学习网络改进蛋白质折叠识别

Sci Rep. 2015 Dec 4;5:17573. doi: 10.1038/srep17573.

Finding optimal interaction interface alignments between biological complexes.寻找生物复合物之间的最佳相互作用界面比对。

Bioinformatics. 2015 Jun 15;31(12):i133-41. doi: 10.1093/bioinformatics/btv242.

A new method to improve network topological similarity search: applied to fold recognition.一种改进网络拓扑相似性搜索的新方法：应用于折叠识别。

Bioinformatics. 2015 Jul 1;31(13):2106-14. doi: 10.1093/bioinformatics/btv125. Epub 2015 Feb 25.

Global view of the protein universe.蛋白质宇宙的全球视角。

Proc Natl Acad Sci U S A. 2014 Aug 12;111(32):11691-6. doi: 10.1073/pnas.1403395111. Epub 2014 Jul 28.

MRFalign: protein homology detection through alignment of Markov random fields.MRFalign：通过马尔可夫随机场比对进行蛋白质同源性检测。

PLoS Comput Biol. 2014 Mar 27;10(3):e1003500. doi: 10.1371/journal.pcbi.1003500. eCollection 2014 Mar.

Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection.结合频率谱中提取的进化信息与基于序列的核函数进行蛋白质远程同源检测。

Bioinformatics. 2014 Feb 15;30(4):472-9. doi: 10.1093/bioinformatics/btt709. Epub 2013 Dec 5.

The Protein Model Portal--a comprehensive resource for protein structure and model information.蛋白质模型门户——蛋白质结构和模型信息的综合资源。

Database (Oxford). 2013 Apr 26;2013:bat031. doi: 10.1093/database/bat031. Print 2013.

Multiple graph regularized protein domain ranking.多图谱正则化蛋白质域排序。

BMC Bioinformatics. 2012 Nov 19;13:307. doi: 10.1186/1471-2105-13-307.

Protein structure prediction from sequence variation.从序列变异预测蛋白质结构。

Nat Biotechnol. 2012 Nov;30(11):1072-80. doi: 10.1038/nbt.2419.

Protein domain recurrence and order can enhance prediction of protein functions.蛋白质结构域的重复和顺序可以增强对蛋白质功能的预测。

Bioinformatics. 2012 Sep 15;28(18):i444-i450. doi: 10.1093/bioinformatics/bts398.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

CMsearch：同时探索蛋白质序列空间和结构空间不仅能改善蛋白质同源性检测，还能提升蛋白质结构预测。

CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.

作者信息

机构信息

出版信息

MOTIVATION

METHOD

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

方法

结果

可用性和实现

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献