Suppr超能文献

BayesMotif:从不纯数据集发现从头蛋白质分拣基序。

BayesMotif: de novo protein sorting motif discovery from impure datasets.

机构信息

Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA.

出版信息

BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S66. doi: 10.1186/1471-2105-11-S1-S66.

Abstract

BACKGROUND

Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms.

METHODS

We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences.

RESULTS

Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances.

CONCLUSION

We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of PWM (position weight matrix) motif model.

摘要

背景

蛋白质分拣是指新合成的蛋白质被运输到细胞内外目标位置的过程。这个过程被不同形式的蛋白质分拣信号精确调控。分拣信号的一个主要类别是氨基酸子序列,通常位于蛋白质序列的 N 端或 C 端。全面的蛋白质分拣信号的基因组实验鉴定非常耗时且昂贵。需要有效的计算算法来从头发现蛋白质分拣信号,以提高对蛋白质分拣机制的理解。

方法

我们将蛋白质分拣基序发现问题表述为分类问题,并提出了一种基于贝叶斯分类器的算法(BayesMotif),用于从头发现一种常见类型的蛋白质分拣基序,其中存在一个高度保守的锚定序列和一个不太保守的基序区域。开发了一种假阳性去除程序,用于迭代地去除不太可能包含真实基序的序列,以便算法可以从不纯的输入序列中识别基序。

结果

在植入的基序数据集和真实世界数据集上的实验表明,增强的 BayesMotif 算法可以从纯或不纯的蛋白质序列数据集中识别锚定分拣基序。它还表明,即使只有 20%的输入序列包含真实基序实例,假阳性去除程序也可以帮助识别真实基序。

结论

我们提出了 BayesMotif,这是一种基于贝叶斯分类的新算法,用于从不纯数据集中从头发现一种特殊类型的锚定蛋白质分拣基序。与 MEME 等传统基序发现算法相比,我们的算法可以找到较短的高度保守的锚定序列,并且具有较短的保守基序。我们的算法还具有易于合并额外的元序列特征(例如基序的疏水性或电荷)的优势,这可能有助于克服 PWM(位置权重矩阵)基序模型的限制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/78f6/3009540/98d31432bc80/1471-2105-11-S1-S66-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验