Saidi Rabie, Dhifli Wajdi, Maddouri Mondher, Mephu Nguifo Engelbert
1 European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, United Kingdom.
2 University of Lille, Faculty of Pharmaceutical and Biological Sciences, EA2694, F-59000 Lille, France.
J Comput Biol. 2019 Jun;26(6):561-571. doi: 10.1089/cmb.2018.0171. Epub 2018 Dec 5.
Studying protein structures is a major asset for understanding the molecular mechanisms of life. The number of publicly available protein structures has increasingly become extremely large. Yet, the classification of a protein structure remains a difficult, costly, and time-consuming task. Exploring spatial information on protein structures can provide important functional and structural insights. In this context, spatial motifs may correspond to relevant fragments, which might be very useful for a better understanding of proteins. In this article, we propose AntMot, a fast algorithm, to find spatial motifs from protein three-dimensional structures by extending the Karp-Miller-Rosenberg repetition finder, originally dedicated to sequences. The extracted motifs, termed ant-motifs, follow an ant-like shape that is composed of a backbone fragment from the primary structure, enriched with spatial refinements. We show that these motifs are biologically sound, and we used them as descriptors in the classification of several benchmark datasets. Experimental results show that our approach presents a trade-off between sequential motifs and subgraph motifs in terms of the number of extracted substructures, while providing a significant enhancement in the classification accuracy over sequential and frequent-subgraph motifs as well as alignment-based approaches.
研究蛋白质结构是理解生命分子机制的一项重要内容。公开可用的蛋白质结构数量已变得极其庞大。然而,蛋白质结构的分类仍然是一项困难、昂贵且耗时的任务。探索蛋白质结构的空间信息可以提供重要的功能和结构见解。在这种背景下,空间基序可能对应于相关片段,这对于更好地理解蛋白质可能非常有用。在本文中,我们提出了AntMot,一种快速算法,通过扩展最初用于序列的Karp-Miller-Rosenberg重复查找器,从蛋白质三维结构中找到空间基序。提取的基序,称为蚁基序,呈类似蚂蚁的形状,由一级结构中的主链片段组成,并富含空间细化信息。我们表明这些基序在生物学上是合理的,并且我们将它们用作几个基准数据集分类中的描述符。实验结果表明,我们的方法在提取的子结构数量方面在序列基序和子图基序之间进行了权衡,同时在分类准确性方面比序列和频繁子图基序以及基于比对的方法有显著提高。