pLoc_bal-mGneg：通过准平衡训练数据集和广义 PseAAC 预测革兰氏阴性细菌蛋白质的亚细胞定位。

pLoc_bal-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by quasi-balancing training dataset and general PseAAC.

机构信息

Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China; The Gordon Life Science Institute, Boston, MA 02478, USA.

Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; The Gordon Life Science Institute, Boston, MA 02478, USA.

出版信息

J Theor Biol. 2018 Dec 7;458:92-102. doi: 10.1016/j.jtbi.2018.09.005. Epub 2018 Sep 8.

DOI:10.1016/j.jtbi.2018.09.005

PMID:30201434

Abstract

One of the hottest topics in molecular cell biology is to determine the subcellular localization of proteins from various different organisms. This is because it is crucially important for both basic research and drug development. Recently, a predictor called "pLoc-mGneg" was developed for identifying the subcellular localization of Gram-negative bacterial proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, called "multiplex proteins", may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mGneg was trained by an extremely skewed dataset in which some subset (subcellular location) was about 5 to 70 times the size of the other subsets. Accordingly, it cannot avoid the biased consequence caused by such an uneven training dataset. To alleviate such a consequence, we have developed a new and bias-reducing predictor called pLoc_bal-mGneg by quasi-balancing the training dataset. Cross-validation tests on exactly the same experiment-confirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mGneg, the existing state-of-the-art predictor in identifying the subcellular localization of Gram-negative bacterial proteins. To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mGneg/, by which users can easily get their desired results without the need to go through the detailed mathematics.

摘要

蛋白质亚细胞定位的计算预测

革兰氏阴性菌蛋白定位新方法

分子细胞生物学领域最热门的课题之一就是确定来自不同生物体的蛋白质的亚细胞定位。这对于基础研究和药物开发都至关重要。最近，一种名为“pLoc-mGneg”的预测器被开发出来，用于识别革兰氏阴性细菌蛋白的亚细胞定位。它的性能远远优于其他用于同一目的的预测器，特别是在处理多标签系统时，其中一些被称为“多聚体蛋白”的蛋白质可能同时存在于两个或更多的亚细胞位置。虽然它确实是一个非常强大的预测器，但肯定需要更多的努力来进一步改进它。这是因为 pLoc-mGneg 是通过一个极度偏斜的数据集进行训练的，其中一些子集（亚细胞位置）的大小是其他子集的 5 到 70 倍。因此，它无法避免由这种不均匀的训练数据集引起的偏差结果。为了减轻这种后果，我们开发了一种新的、减少偏差的预测器，称为 pLoc_bal-mGneg，通过准平衡训练数据集来实现。在完全相同的实验确认数据集上进行的交叉验证测试表明，所提出的新预测器在识别革兰氏阴性细菌蛋白的亚细胞定位方面明显优于现有的最先进的预测器 pLoc-mGneg。为了最大限度地方便大多数实验科学家，我们在 http://www.jci-bioinfo.cn/pLoc_bal-mGneg/ 上建立了一个新的预测器的用户友好型网络服务器，用户可以轻松地获得他们所需的结果，而无需经历详细的数学运算。

相似文献

pLoc_bal-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by quasi-balancing training dataset and general PseAAC.pLoc_bal-mGneg：通过准平衡训练数据集和广义 PseAAC 预测革兰氏阴性细菌蛋白质的亚细胞定位。

J Theor Biol. 2018 Dec 7;458:92-102. doi: 10.1016/j.jtbi.2018.09.005. Epub 2018 Sep 8.

pLoc_bal-mGpos: Predict subcellular localization of Gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC.pLoc_bal-mGpos：通过准平衡训练数据集和 PseAAC 预测革兰氏阳性菌蛋白质的亚细胞定位

Genomics. 2019 Jul;111(4):886-892. doi: 10.1016/j.ygeno.2018.05.017. Epub 2018 May 26.

pLoc_bal-mVirus: Predict Subcellular Localization of Multi-Label Virus Proteins by Chou's General PseAAC and IHTS Treatment to Balance Training Dataset.pLoc_bal-mVirus：基于周式广义伪氨基酸组成和用于平衡训练数据集的迭代启发式阈值选择处理预测多标签病毒蛋白的亚细胞定位

Med Chem. 2019;15(5):496-509. doi: 10.2174/1573406415666181217114710.

pLoc_bal-mEuk: Predict Subcellular Localization of Eukaryotic Proteins by General PseAAC and Quasi-balancing Training Dataset.pLoc_bal-mEuk：基于通用伪氨基酸组成和准平衡训练数据集预测真核生物蛋白质的亚细胞定位

Med Chem. 2019;15(5):472-485. doi: 10.2174/1573406415666181218102517.

pLoc_bal-mPlant: Predict Subcellular Localization of Plant Proteins by General PseAAC and Balancing Training Dataset.pLoc_bal-mPlant：基于广义 PseAAC 和平衡训练数据集预测植物蛋白的亚细胞定位

Curr Pharm Des. 2018;24(34):4013-4022. doi: 10.2174/1381612824666181119145030.

pLoc_bal-mAnimal: predict subcellular localization of animal proteins by balancing training dataset and PseAAC.pLoc_bal-mAnimal：通过平衡训练数据集和 PseAAC 来预测动物蛋白质的亚细胞定位。

Bioinformatics. 2019 Feb 1;35(3):398-406. doi: 10.1093/bioinformatics/bty628.

pLoc_bal-mHum: Predict subcellular localization of human proteins by PseAAC and quasi-balancing training dataset.pLoc_bal-mHum：通过 PseAAC 和准平衡训练数据集预测人类蛋白质的亚细胞定位。

Genomics. 2019 Dec;111(6):1274-1282. doi: 10.1016/j.ygeno.2018.08.007. Epub 2018 Sep 1.

pLoc-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC.pLoc-mGneg：通过基于通用伪氨基酸组成的深度基因本体学习预测革兰氏阴性菌蛋白质的亚细胞定位。

Genomics. 2017 Oct 6. doi: 10.1016/j.ygeno.2017.10.002.

pLoc-mVirus: Predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general PseAAC.pLoc-mVirus：通过将最优的基因本体（GO）信息整合到通用的伪氨基酸组成（PseAAC）中来预测多定位病毒蛋白的亚细胞定位

Gene. 2017 Sep 10;628:315-321. doi: 10.1016/j.gene.2017.07.036. Epub 2017 Jul 18.

pLoc-mEuk: Predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC.pLoc-mEuk：通过将关键 GO 信息提取到通用 PseAAC 中，预测多标签真核蛋白质的亚细胞定位。

Genomics. 2018 Jan;110(1):50-58. doi: 10.1016/j.ygeno.2017.08.005. Epub 2017 Aug 14.

引用本文的文献

GASIDN: identification of sub-Golgi proteins with multi-scale feature fusion.GASIDN：具有多尺度特征融合的亚高尔基体蛋白鉴定。

BMC Genomics. 2024 Oct 30;25(1):1019. doi: 10.1186/s12864-024-10954-3.

Hemolytic-Pred: A machine learning-based predictor for hemolytic proteins using position and composition-based features.溶血预测器：一种基于机器学习的溶血蛋白预测工具，使用基于位置和组成的特征。

Digit Health. 2023 Jul 5;9:20552076231180739. doi: 10.1177/20552076231180739. eCollection 2023 Jan-Dec.

A machine learning technique for identifying DNA enhancer regions utilizing CIS-regulatory element patterns.一种利用 CIS 调控元件模式识别 DNA 增强子区域的机器学习技术。

Sci Rep. 2022 Sep 7;12(1):15183. doi: 10.1038/s41598-022-19099-3.

Identifying Pupylation Proteins and Sites by Incorporating Multiple Methods.通过整合多种方法鉴定泛素化蛋白和位点。

Front Endocrinol (Lausanne). 2022 Apr 26;13:849549. doi: 10.3389/fendo.2022.849549. eCollection 2022.

iSUMOK-PseAAC: prediction of lysine sumoylation sites using statistical moments and Chou's PseAAC.iSUMOK-PseAAC：利用统计矩和周氏伪氨基酸组成预测赖氨酸的类泛素化位点

PeerJ. 2021 Aug 4;9:e11581. doi: 10.7717/peerj.11581. eCollection 2021.

Identify Lysine Neddylation Sites Using Bi-profile Bayes Feature Extraction the Chou's 5-steps Rule and General Pseudo Components.使用双轮廓贝叶斯特征提取、周氏五步法则和广义伪组分鉴定赖氨酸N-乙酰化位点。

Curr Genomics. 2019 Dec;20(8):592-601. doi: 10.2174/1389202921666191223154629.

Characterization of the relationship between FLI1 and immune infiltrate level in tumour immune microenvironment for breast cancer.乳腺癌肿瘤免疫微环境中 FLI1 与免疫浸润水平关系的特征分析。

J Cell Mol Med. 2020 May;24(10):5501-5514. doi: 10.1111/jcmm.15205. Epub 2020 Apr 5.

iSulfoTyr-PseAAC: Identify Tyrosine Sulfation Sites by Incorporating Statistical Moments Chou's 5-steps Rule and Pseudo Components.iSulfoTyr-PseAAC：通过结合统计矩、周氏五步法则和伪组分来识别酪氨酸硫酸化位点

Curr Genomics. 2019 May;20(4):306-320. doi: 10.2174/1389202920666190819091609.

iMethylK_pseAAC: Improving Accuracy of Lysine Methylation Sites Identification by Incorporating Statistical Moments and Position Relative Features into General PseAAC Chou's 5-steps Rule.iMethylK_pseAAC：通过将统计矩和位置相关特征纳入通用伪氨基酸组成的周氏五步法则来提高赖氨酸甲基化位点识别的准确性

Curr Genomics. 2019 May;20(4):275-292. doi: 10.2174/1389202920666190809095206.

Some illuminating remarks on molecular genetics and genomics as well as drug development.关于分子遗传学和基因组学以及药物开发的一些有启发性的观点。

Mol Genet Genomics. 2020 Mar;295(2):261-274. doi: 10.1007/s00438-019-01634-z. Epub 2020 Jan 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

pLoc_bal-mGneg：通过准平衡训练数据集和广义 PseAAC 预测革兰氏阴性细菌蛋白质的亚细胞定位。

pLoc_bal-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by quasi-balancing training dataset and general PseAAC.

机构信息

出版信息

蛋白质亚细胞定位的计算预测

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献