Suppr超能文献

鉴定生物序列中基于分类的判别基序。

Identifying discriminative classification-based motifs in biological sequences.

机构信息

Katholieke Universiteit Leuven, Department of Computer Science, Celestijnenlaan 200A, Leuven, Belgium.

出版信息

Bioinformatics. 2011 May 1;27(9):1231-8. doi: 10.1093/bioinformatics/btr110. Epub 2011 Mar 3.

Abstract

MOTIVATION

Identification of conserved motifs in biological sequences is crucial to unveil common shared functions. Many tools exist for motif identification, including some that allow degenerate positions with multiple possible nucleotides or amino acids. Most efficient methods available today search conserved motifs in a set of sequences, but do not check for their specificity regarding to a set of negative sequences.

RESULTS

We present a tool to identify degenerate motifs, based on a given classification of amino acids according to their physico-chemical properties. It returns the top K motifs that are most frequent in a positive set of sequences involved in a biological process of interest, and absent from a negative set. Thus, our method discovers discriminative motifs in biological sequences that may be used to identify new sequences involved in the same process. We used this tool to identify candidate effector proteins secreted into plant tissues by the root knot nematode Meloidogyne incognita. Our tool identified a series of motifs specifically present in a positive set of known effectors while totally absent from a negative set of evolutionarily conserved housekeeping proteins. Scanning the proteome of M. incognita, we detected 2579 proteins that contain these specific motifs and can be considered as new putative effectors.

AVAILABILITY AND IMPLEMENTATION

The motif discovery tool and the proteins used in the experiments are available at http://dtai.cs.kuleuven.be/ml/systems/merci.

摘要

动机

在生物序列中识别保守基序对于揭示共同的共享功能至关重要。许多工具可用于基序识别,包括允许具有多个可能核苷酸或氨基酸的简并位置的工具。当今最有效的方法是在一组序列中搜索保守基序,但不检查它们针对一组负序列的特异性。

结果

我们提出了一种基于给定的氨基酸理化性质分类来识别简并基序的工具。它返回在与感兴趣的生物过程相关的阳性序列集中最频繁出现且在阴性序列集中不存在的前 K 个最常见的基序。因此,我们的方法可以在生物序列中发现具有区分度的基序,这些基序可用于识别参与同一过程的新序列。我们使用此工具来鉴定根结线虫 Meloidogyne incognita 分泌到植物组织中的候选效应蛋白。我们的工具鉴定了一系列特定存在于阳性已知效应物集中而完全不存在于进化保守管家蛋白阴性集中的基序。扫描 M. incognita 的蛋白质组,我们检测到 2579 种含有这些特定基序的蛋白质,可被视为新的潜在效应物。

可用性和实现

http://dtai.cs.kuleuven.be/ml/systems/merci 上提供了基序发现工具和实验中使用的蛋白质。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验