Caboche Ségolène, Pupin Maude, Leclère Valérie, Jacques Phillipe, Kucherov Gregory
Computer Science Laboratory of Lille, UMR USTL/CNRS 8022, INRIA, F59655, Villeneuve d'Ascq, France.
BMC Struct Biol. 2009 Mar 18;9:15. doi: 10.1186/1472-6807-9-15.
Nonribosomal peptides (NRPs), bioactive secondary metabolites produced by many microorganisms, show a broad range of important biological activities (e.g. antibiotics, immunosuppressants, antitumor agents). NRPs are mainly composed of amino acids but their primary structure is not always linear and can contain cycles or branchings. Furthermore, there are several hundred different monomers that can be incorporated into NRPs. The NORINE database, the first resource entirely dedicated to NRPs, currently stores more than 700 NRPs annotated with their monomeric peptide structure encoded by undirected labeled graphs. This opens a way to a systematic analysis of structural patterns occurring in NRPs. Such studies can investigate the functional role of some monomeric chains, or analyse NRPs that have been computationally predicted from the synthetase protein sequence. A basic operation in such analyses is the search for a given structural pattern in the database.
We developed an efficient method that allows for a quick search for a structural pattern in the NORINE database. The method identifies all peptides containing a pattern substructure of a given size. This amounts to solving a variant of the maximum common subgraph problem on pattern and peptide graphs, which is done by computing cliques in an appropriate compatibility graph.
The method has been incorporated into the NORINE database, available at http://bioinfo.lifl.fr/norine. Less than one second is needed to search for a pattern in the entire database.
非核糖体肽(NRPs)是由许多微生物产生的具有生物活性的次生代谢产物,具有广泛的重要生物活性(如抗生素、免疫抑制剂、抗肿瘤剂)。NRPs主要由氨基酸组成,但其一级结构并不总是线性的,可能包含环或分支。此外,有数百种不同的单体可以纳入NRPs。NORINE数据库是第一个完全致力于NRPs的资源,目前存储了700多种NRPs,并用无向标记图编码的单体肽结构进行了注释。这为系统分析NRPs中出现的结构模式开辟了一条途径。此类研究可以调查某些单体链的功能作用,或分析从合成酶蛋白序列通过计算预测得到的NRPs。此类分析中的一个基本操作是在数据库中搜索给定的结构模式。
我们开发了一种高效的方法,可在NORINE数据库中快速搜索结构模式。该方法可识别所有包含给定大小模式子结构的肽。这相当于解决模式图和肽图上最大公共子图问题的一个变体,通过在适当的兼容性图中计算团来完成。
该方法已被纳入NORINE数据库,可在http://bioinfo.lifl.fr/norine上获取。在整个数据库中搜索一个模式所需时间不到一秒。