Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.
Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China; The Gordon Life Science Institute, Boston, MA 02478, USA.
Genomics. 2018 Jan;110(1):50-58. doi: 10.1016/j.ygeno.2017.08.005. Epub 2017 Aug 14.
Many efforts have been made in predicting the subcellular localization of eukaryotic proteins, but most of the existing methods have the following two limitations: (1) their coverage scope is less than ten locations and hence many organelles in an eukaryotic cell cannot be covered, and (2) they can only be used to deal with single-label systems in which each of the constituent proteins has one and only one location. Actually, proteins with multiple locations are particularly interesting since they may have some exceptional functions very important for in-depth understanding the biological process in a cell and for selecting drug target as well. Although several predictors (such as "Euk-mPLoc", "Euk-PLoc 2.0" and "iLoc-Euk") can cover up to 22 different location sites, and they also have the function to treat multi-labeled proteins, further efforts are needed to improve their prediction quality, particularly in enhancing the absolute true rate and in reducing the absolute false rate. Here we propose a new predictor called "pLoc-mEuk" by extracting the key GO (Gene Ontology) information into the general PseAAC (Pseudo Amino Acid Composition). Rigorous cross-validations on a high-quality and stringent benchmark dataset have indicated that the proposed pLoc-mEuk predictor is remarkably superior to iLoc-Euk, the best of the aforementioned three predictors. To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc-mEuk/, by which users can easily get their desired results without the need to go through the complicated mathematics involved.
许多研究都致力于预测真核生物蛋白质的亚细胞定位,但大多数现有的方法都存在以下两个局限性:(1)它们的覆盖范围小于十个位置,因此真核细胞中的许多细胞器无法被覆盖;(2)它们只能用于处理单标签系统,其中每个组成蛋白质只有一个且唯一的位置。实际上,具有多个位置的蛋白质特别有趣,因为它们可能具有一些特殊功能,对于深入了解细胞中的生物学过程以及选择药物靶点非常重要。尽管有几个预测器(如“Euk-mPLoc”、“Euk-PLoc 2.0”和“iLoc-Euk”)可以覆盖多达 22 个不同的位置,并且它们还具有处理多标签蛋白质的功能,但仍需要进一步努力来提高它们的预测质量,特别是在提高绝对真实率和降低绝对假率方面。在这里,我们提出了一个新的预测器,称为“pLoc-mEuk”,通过将关键 GO(基因本体论)信息提取到通用 PseAAC(伪氨基酸组成)中。在一个高质量和严格的基准数据集上进行的严格交叉验证表明,所提出的 pLoc-mEuk 预测器明显优于 iLoc-Euk,是上述三个预测器中最好的。为了最大限度地方便大多数实验科学家,我们在 http://www.jci-bioinfo.cn/pLoc-mEuk/ 上建立了一个新预测器的用户友好型网络服务器,用户可以轻松获得他们所需的结果,而无需经历涉及的复杂数学运算。