Robert-Cedergren Center for Bioinformatics and Genomics; Biochemistry Department, Université de Montréal, 2900 Edouard-Montpetit, Montreal, QC, H3T 1J4, Canada.
BMC Bioinformatics. 2010 Nov 15;11:563. doi: 10.1186/1471-2105-11-563.
The eukaryotic cell has an intricate architecture with compartments and substructures dedicated to particular biological processes. Knowing the subcellular location of proteins not only indicates how bio-processes are organized in different cellular compartments, but also contributes to unravelling the function of individual proteins. Computational localization prediction is possible based on sequence information alone, and has been successfully applied to proteins from virtually all subcellular compartments and all domains of life. However, we realized that current prediction tools do not perform well on partial protein sequences such as those inferred from Expressed Sequence Tag (EST) data, limiting the exploitation of the large and taxonomically most comprehensive body of sequence information from eukaryotes.
We developed a new predictor, TESTLoc, suited for subcellular localization prediction of proteins based on their partial sequence conceptually translated from ESTs (EST-peptides). Support Vector Machine (SVM) is used as computational method and EST-peptides are represented by different features such as amino acid composition and physicochemical properties. When TESTLoc was applied to the most challenging test case (plant data), it yielded high accuracy (~85%).
TESTLoc is a localization prediction tool tailored for EST data. It provides a variety of models for the users to choose from, and is available for download at http://megasun.bch.umontreal.ca/~shenyq/TESTLoc/TESTLoc.html.
真核细胞的结构错综复杂,其中的隔室和亚结构专门用于特定的生物过程。了解蛋白质的亚细胞定位不仅表明了生物过程在不同的细胞隔室中是如何组织的,而且有助于揭示单个蛋白质的功能。仅基于序列信息就可以进行计算定位预测,并且已经成功应用于来自几乎所有亚细胞隔室和所有生命领域的蛋白质。然而,我们意识到,当前的预测工具在部分蛋白质序列(例如从表达序列标签(EST)数据推断出的序列)上的性能并不理想,限制了对来自真核生物的大量且在分类上最全面的序列信息的利用。
我们开发了一种新的预测器 TESTLoc,适用于基于从 EST(EST-肽)概念上翻译的部分序列的蛋白质亚细胞定位预测。支持向量机(SVM)被用作计算方法,EST-肽由不同的特征表示,如氨基酸组成和理化性质。当将 TESTLoc 应用于最具挑战性的测试案例(植物数据)时,它产生了高准确性(约 85%)。
TESTLoc 是一个专为 EST 数据设计的定位预测工具。它为用户提供了多种模型可供选择,并可在 http://megasun.bch.umontreal.ca/~shenyq/TESTLoc/TESTLoc.html 下载。