• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

pLoc_bal-mPlant:基于广义 PseAAC 和平衡训练数据集预测植物蛋白的亚细胞定位

pLoc_bal-mPlant: Predict Subcellular Localization of Plant Proteins by General PseAAC and Balancing Training Dataset.

机构信息

Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.

The Gordon Life Science Institute, Boston, MA 02478, United States.

出版信息

Curr Pharm Des. 2018;24(34):4013-4022. doi: 10.2174/1381612824666181119145030.

DOI:10.2174/1381612824666181119145030
PMID:30451108
Abstract

Knowledge of protein subcellular localization is vitally important for both basic research and drug development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to develop computational tools for timely and effectively identifying their subcellular localization based on the sequence information alone. Recently, a predictor called "pLoc-mPlant" was developed for identifying the subcellular localization of plant proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, called "multiplex proteins", may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mPlant was trained by an extremely skewed dataset in which some subsets (i.e., the protein numbers for some subcellular locations) were more than 10 times larger than the others. Accordingly, it cannot avoid the biased consequence caused by such an uneven training dataset. To overcome such biased consequence, we have developed a new and bias-free predictor called pLoc_bal-mPlant by balancing the training dataset. Cross-validation tests on exactly the same experimentconfirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mPlant, the existing state-of-the-art predictor in identifying the subcellular localization of plant proteins. To maximize the convenience for the majority of experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mPlant/, by which users can easily get their desired results without the need to go through the detailed mathematics.

摘要

蛋白质亚细胞定位的知识对于基础研究和药物开发都至关重要。在后基因组时代,大量涌现的蛋白质序列使得人们非常希望开发计算工具,仅根据序列信息就能够及时有效地识别它们的亚细胞定位。最近,开发了一种名为“pLoc-mPlant”的预测器,用于识别植物蛋白质的亚细胞定位。它的性能远远优于其他用于相同目的的预测器,特别是在处理多标签系统时,一些蛋白质,称为“多聚蛋白”,可能同时存在于两个或更多的亚细胞位置。尽管它确实是一个非常强大的预测器,但肯定需要更多的努力来进一步改进它。这是因为 pLoc-mPlant 是通过一个非常倾斜的数据集进行训练的,其中一些子集(即某些亚细胞位置的蛋白质数量)比其他子集大 10 倍以上。因此,它无法避免这种不平衡训练数据集所带来的有偏差的结果。为了克服这种有偏差的结果,我们通过平衡训练数据集开发了一种新的、无偏差的预测器,称为 pLoc_bal-mPlant。在完全相同的实验确认数据集上进行的交叉验证测试表明,所提出的新预测器在识别植物蛋白质的亚细胞定位方面明显优于现有的最先进的预测器 pLoc-mPlant。为了最大限度地方便大多数实验科学家,我们在 http://www.jci-bioinfo.cn/pLoc_bal-mPlant/ 上建立了一个新的预测器的用户友好型网络服务器,用户可以轻松地获得他们想要的结果,而无需了解详细的数学原理。

相似文献

1
pLoc_bal-mPlant: Predict Subcellular Localization of Plant Proteins by General PseAAC and Balancing Training Dataset.pLoc_bal-mPlant:基于广义 PseAAC 和平衡训练数据集预测植物蛋白的亚细胞定位
Curr Pharm Des. 2018;24(34):4013-4022. doi: 10.2174/1381612824666181119145030.
2
pLoc_bal-mVirus: Predict Subcellular Localization of Multi-Label Virus Proteins by Chou's General PseAAC and IHTS Treatment to Balance Training Dataset.pLoc_bal-mVirus:基于周式广义伪氨基酸组成和用于平衡训练数据集的迭代启发式阈值选择处理预测多标签病毒蛋白的亚细胞定位
Med Chem. 2019;15(5):496-509. doi: 10.2174/1573406415666181217114710.
3
pLoc_bal-mEuk: Predict Subcellular Localization of Eukaryotic Proteins by General PseAAC and Quasi-balancing Training Dataset.pLoc_bal-mEuk:基于通用伪氨基酸组成和准平衡训练数据集预测真核生物蛋白质的亚细胞定位
Med Chem. 2019;15(5):472-485. doi: 10.2174/1573406415666181218102517.
4
pLoc_bal-mGpos: Predict subcellular localization of Gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC.pLoc_bal-mGpos:通过准平衡训练数据集和 PseAAC 预测革兰氏阳性菌蛋白质的亚细胞定位
Genomics. 2019 Jul;111(4):886-892. doi: 10.1016/j.ygeno.2018.05.017. Epub 2018 May 26.
5
pLoc_bal-mAnimal: predict subcellular localization of animal proteins by balancing training dataset and PseAAC.pLoc_bal-mAnimal:通过平衡训练数据集和 PseAAC 来预测动物蛋白质的亚细胞定位。
Bioinformatics. 2019 Feb 1;35(3):398-406. doi: 10.1093/bioinformatics/bty628.
6
pLoc_bal-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by quasi-balancing training dataset and general PseAAC.pLoc_bal-mGneg:通过准平衡训练数据集和广义 PseAAC 预测革兰氏阴性细菌蛋白质的亚细胞定位。
J Theor Biol. 2018 Dec 7;458:92-102. doi: 10.1016/j.jtbi.2018.09.005. Epub 2018 Sep 8.
7
pLoc_bal-mHum: Predict subcellular localization of human proteins by PseAAC and quasi-balancing training dataset.pLoc_bal-mHum:通过 PseAAC 和准平衡训练数据集预测人类蛋白质的亚细胞定位。
Genomics. 2019 Dec;111(6):1274-1282. doi: 10.1016/j.ygeno.2018.08.007. Epub 2018 Sep 1.
8
pLoc-mPlant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general PseAAC.pLoc-mPlant:通过将最优的基因本体(GO)信息整合到通用的伪氨基酸组成(PseAAC)中,预测多定位植物蛋白的亚细胞定位
Mol Biosyst. 2017 Aug 22;13(9):1722-1727. doi: 10.1039/c7mb00267j.
9
pLoc-mVirus: Predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general PseAAC.pLoc-mVirus:通过将最优的基因本体(GO)信息整合到通用的伪氨基酸组成(PseAAC)中来预测多定位病毒蛋白的亚细胞定位
Gene. 2017 Sep 10;628:315-321. doi: 10.1016/j.gene.2017.07.036. Epub 2017 Jul 18.
10
pLoc-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC.pLoc-mGneg:通过基于通用伪氨基酸组成的深度基因本体学习预测革兰氏阴性菌蛋白质的亚细胞定位。
Genomics. 2017 Oct 6. doi: 10.1016/j.ygeno.2017.10.002.

引用本文的文献

1
Crop Proteomics under Abiotic Stress: From Data to Insights.非生物胁迫下的作物蛋白质组学:从数据到见解
Plants (Basel). 2022 Oct 27;11(21):2877. doi: 10.3390/plants11212877.
2
Multilocation proteins in organelle communication: Based on protein-protein interactions.细胞器通讯中的多位点蛋白:基于蛋白质-蛋白质相互作用
Plant Direct. 2022 Feb 21;6(2):e386. doi: 10.1002/pld3.386. eCollection 2022 Feb.
3
CNNLSTMac4CPred: A Hybrid Model for N4-Acetylcytidine Prediction.CNNLS-TMac4CPred:一种用于 N4-乙酰胞苷预测的混合模型。
Interdiscip Sci. 2022 Jun;14(2):439-451. doi: 10.1007/s12539-021-00500-0. Epub 2022 Feb 1.
4
Identify Lysine Neddylation Sites Using Bi-profile Bayes Feature Extraction the Chou's 5-steps Rule and General Pseudo Components.使用双轮廓贝叶斯特征提取、周氏五步法则和广义伪组分鉴定赖氨酸N-乙酰化位点。
Curr Genomics. 2019 Dec;20(8):592-601. doi: 10.2174/1389202921666191223154629.
5
Characterization of the relationship between FLI1 and immune infiltrate level in tumour immune microenvironment for breast cancer.乳腺癌肿瘤免疫微环境中 FLI1 与免疫浸润水平关系的特征分析。
J Cell Mol Med. 2020 May;24(10):5501-5514. doi: 10.1111/jcmm.15205. Epub 2020 Apr 5.
6
iSulfoTyr-PseAAC: Identify Tyrosine Sulfation Sites by Incorporating Statistical Moments Chou's 5-steps Rule and Pseudo Components.iSulfoTyr-PseAAC:通过结合统计矩、周氏五步法则和伪组分来识别酪氨酸硫酸化位点
Curr Genomics. 2019 May;20(4):306-320. doi: 10.2174/1389202920666190819091609.
7
iMethylK_pseAAC: Improving Accuracy of Lysine Methylation Sites Identification by Incorporating Statistical Moments and Position Relative Features into General PseAAC Chou's 5-steps Rule.iMethylK_pseAAC:通过将统计矩和位置相关特征纳入通用伪氨基酸组成的周氏五步法则来提高赖氨酸甲基化位点识别的准确性
Curr Genomics. 2019 May;20(4):275-292. doi: 10.2174/1389202920666190809095206.
8
Some illuminating remarks on molecular genetics and genomics as well as drug development.关于分子遗传学和基因组学以及药物开发的一些有启发性的观点。
Mol Genet Genomics. 2020 Mar;295(2):261-274. doi: 10.1007/s00438-019-01634-z. Epub 2020 Jan 1.
9
RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou's five-step rule.RAACBook:一个基于简化氨基酸字母表的网络服务器,用于通过使用周保罗的五步法则进行序列相关推断。
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz131.