Suppr超能文献

pLoc_bal-mVirus:基于周式广义伪氨基酸组成和用于平衡训练数据集的迭代启发式阈值选择处理预测多标签病毒蛋白的亚细胞定位

pLoc_bal-mVirus: Predict Subcellular Localization of Multi-Label Virus Proteins by Chou's General PseAAC and IHTS Treatment to Balance Training Dataset.

作者信息

Xiao Xuan, Cheng Xiang, Chen Genqiang, Mao Qi, Chou Kuo-Chen

机构信息

Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.

Gordon Life Science Institute, Boston, MA 02478, United States.

出版信息

Med Chem. 2019;15(5):496-509. doi: 10.2174/1573406415666181217114710.

Abstract

BACKGROUND/OBJECTIVE: Knowledge of protein subcellular localization is vitally important for both basic research and drug development. Facing the avalanche of protein sequences emerging in the post-genomic age, it is urgent to develop computational tools for timely and effectively identifying their subcellular localization based on the sequence information alone. Recently, a predictor called "pLoc-mVirus" was developed for identifying the subcellular localization of virus proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, known as "multiplex proteins", may simultaneously occur in, or move between two or more subcellular location sites. Despite the fact that it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mVirus was trained by an extremely skewed dataset in which some subset was over 10 times the size of the other subsets. Accordingly, it cannot avoid the biased consequence caused by such an uneven training dataset.

METHODS

Using the Chou's general PseAAC (Pseudo Amino Acid Composition) approach and the IHTS (Inserting Hypothetical Training Samples) treatment to balance out the training dataset, we have developed a new predictor called "pLoc_bal-mVirus" for predicting the subcellular localization of multi-label virus proteins.

RESULTS

Cross-validation tests on exactly the same experiment-confirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mVirus, the existing state-of-theart predictor for the same purpose.

CONCLUSION

Its user-friendly web-server is available at http://www.jci-bioinfo.cn/pLoc_balmVirus/, by which the majority of experimental scientists can easily get their desired results without the need to go through the detailed complicated mathematics. Accordingly, pLoc_bal-mVirus will become a very useful tool for designing multi-target drugs and in-depth understanding of the biological process in a cell.

摘要

背景/目的:蛋白质亚细胞定位的知识对于基础研究和药物开发都至关重要。面对后基因组时代涌现的大量蛋白质序列,迫切需要开发计算工具,以便仅基于序列信息及时、有效地识别它们的亚细胞定位。最近,开发了一种名为“pLoc-mVirus”的预测器,用于识别病毒蛋白的亚细胞定位。对于相同目的,其性能比其他预测器要好得多,特别是在处理多标签系统时,其中一些蛋白质,即“多重蛋白”,可能同时出现在两个或更多亚细胞定位位点,或在这些位点之间移动。尽管它确实是一个非常强大的预测器,但仍肯定需要更多努力来进一步改进它。这是因为pLoc-mVirus是由一个极度不均衡的数据集训练的,其中一些子集的大小是其他子集的10倍以上。因此,它无法避免由这种不均衡训练数据集导致的偏差后果。

方法

我们使用周的通用伪氨基酸组成(PseAAC)方法和插入假设训练样本(IHTS)处理来平衡训练数据集,开发了一种名为“pLoc_bal-mVirus”的新预测器,用于预测多标签病毒蛋白的亚细胞定位。

结果

在完全相同的经实验确认的数据集上进行的交叉验证测试表明,所提出的新预测器明显优于pLoc-mVirus,即现有的用于相同目的的最先进预测器。

结论

其用户友好的网络服务器可在http://www.jci-bioinfo.cn/pLoc_balmVirus/上获取,大多数实验科学家可以通过该服务器轻松获得他们想要的结果,而无需进行详细复杂的数学运算。因此,pLoc_bal-mVirus将成为设计多靶点药物和深入了解细胞中生物过程的非常有用的工具。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验