Suppr超能文献

SLocX:利用基因表达数据预测拟南芥蛋白的亚细胞定位。

SLocX: Predicting Subcellular Localization of Arabidopsis Proteins Leveraging Gene Expression Data.

机构信息

Max Planck Institute of Molecular Plant Physiology Potsdam, Germany.

出版信息

Front Plant Sci. 2011 Sep 12;2:43. doi: 10.3389/fpls.2011.00043. eCollection 2011.

Abstract

Despite the growing volume of experimentally validated knowledge about the subcellular localization of plant proteins, a well performing in silico prediction tool is still a necessity. Existing tools, which employ information derived from protein sequence alone, offer limited accuracy and/or rely on full sequence availability. We explored whether gene expression profiling data can be harnessed to enhance prediction performance. To achieve this, we trained several support vector machines to predict the subcellular localization of Arabidopsis thaliana proteins using sequence derived information, expression behavior, or a combination of these data and compared their predictive performance through a cross-validation test. We show that gene expression carries information about the subcellular localization not available in sequence information, yielding dramatic benefits for plastid localization prediction, and some notable improvements for other compartments such as the mitochondrion, the Golgi, and the plasma membrane. Based on these results, we constructed a novel subcellular localization prediction engine, SLocX, combining gene expression profiling data with protein sequence-based information. We then validated the results of this engine using an independent test set of annotated proteins and a transient expression of GFP fusion proteins. Here, we present the prediction framework and a website of predicted localizations for Arabidopsis. The relatively good accuracy of our prediction engine, even in cases where only partial protein sequence is available (e.g., in sequences lacking the N-terminal region), offers a promising opportunity for similar application to non-sequenced or poorly annotated plant species. Although the prediction scope of our method is currently limited by the availability of expression information on the ATH1 array, we believe that the advances in measuring gene expression technology will make our method applicable for all Arabidopsis proteins.

摘要

尽管关于植物蛋白质亚细胞定位的实验验证知识不断增加,但仍需要一个性能良好的计算预测工具。现有的工具仅使用来自蛋白质序列的信息,提供的准确性有限,或者依赖于完整序列的可用性。我们探讨了是否可以利用基因表达谱数据来提高预测性能。为了实现这一目标,我们使用序列衍生信息、表达行为或这些数据的组合,训练了几个支持向量机来预测拟南芥蛋白质的亚细胞定位,并通过交叉验证测试比较了它们的预测性能。我们表明,基因表达携带了序列信息中不可用的亚细胞定位信息,这对质体定位预测有显著的好处,对其他细胞器(如线粒体、高尔基体和质膜)也有一些显著的改进。基于这些结果,我们构建了一个新的亚细胞定位预测引擎 SLocX,将基因表达谱数据与基于蛋白质序列的信息相结合。然后,我们使用独立的注释蛋白测试集和 GFP 融合蛋白的瞬时表达来验证该引擎的结果。在这里,我们提出了预测框架和拟南芥的预测定位网站。即使在仅提供部分蛋白质序列的情况下(例如,在缺乏 N 端区域的序列中),我们的预测引擎的准确性也相对较高,这为类似的应用于非测序或注释较差的植物物种提供了一个有前途的机会。尽管我们方法的预测范围目前受到 ATH1 芯片上基因表达信息可用性的限制,但我们相信基因表达技术测量的进步将使我们的方法适用于所有拟南芥蛋白。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/edeb/3355584/b4730686a29b/fpls-02-00043-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验