• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

pLoc_bal-mVirus:基于周式广义伪氨基酸组成和用于平衡训练数据集的迭代启发式阈值选择处理预测多标签病毒蛋白的亚细胞定位

pLoc_bal-mVirus: Predict Subcellular Localization of Multi-Label Virus Proteins by Chou's General PseAAC and IHTS Treatment to Balance Training Dataset.

作者信息

Xiao Xuan, Cheng Xiang, Chen Genqiang, Mao Qi, Chou Kuo-Chen

机构信息

Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.

Gordon Life Science Institute, Boston, MA 02478, United States.

出版信息

Med Chem. 2019;15(5):496-509. doi: 10.2174/1573406415666181217114710.

DOI:10.2174/1573406415666181217114710
PMID:30556503
Abstract

BACKGROUND/OBJECTIVE: Knowledge of protein subcellular localization is vitally important for both basic research and drug development. Facing the avalanche of protein sequences emerging in the post-genomic age, it is urgent to develop computational tools for timely and effectively identifying their subcellular localization based on the sequence information alone. Recently, a predictor called "pLoc-mVirus" was developed for identifying the subcellular localization of virus proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, known as "multiplex proteins", may simultaneously occur in, or move between two or more subcellular location sites. Despite the fact that it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mVirus was trained by an extremely skewed dataset in which some subset was over 10 times the size of the other subsets. Accordingly, it cannot avoid the biased consequence caused by such an uneven training dataset.

METHODS

Using the Chou's general PseAAC (Pseudo Amino Acid Composition) approach and the IHTS (Inserting Hypothetical Training Samples) treatment to balance out the training dataset, we have developed a new predictor called "pLoc_bal-mVirus" for predicting the subcellular localization of multi-label virus proteins.

RESULTS

Cross-validation tests on exactly the same experiment-confirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mVirus, the existing state-of-theart predictor for the same purpose.

CONCLUSION

Its user-friendly web-server is available at http://www.jci-bioinfo.cn/pLoc_balmVirus/, by which the majority of experimental scientists can easily get their desired results without the need to go through the detailed complicated mathematics. Accordingly, pLoc_bal-mVirus will become a very useful tool for designing multi-target drugs and in-depth understanding of the biological process in a cell.

摘要

背景/目的:蛋白质亚细胞定位的知识对于基础研究和药物开发都至关重要。面对后基因组时代涌现的大量蛋白质序列,迫切需要开发计算工具,以便仅基于序列信息及时、有效地识别它们的亚细胞定位。最近,开发了一种名为“pLoc-mVirus”的预测器,用于识别病毒蛋白的亚细胞定位。对于相同目的,其性能比其他预测器要好得多,特别是在处理多标签系统时,其中一些蛋白质,即“多重蛋白”,可能同时出现在两个或更多亚细胞定位位点,或在这些位点之间移动。尽管它确实是一个非常强大的预测器,但仍肯定需要更多努力来进一步改进它。这是因为pLoc-mVirus是由一个极度不均衡的数据集训练的,其中一些子集的大小是其他子集的10倍以上。因此,它无法避免由这种不均衡训练数据集导致的偏差后果。

方法

我们使用周的通用伪氨基酸组成(PseAAC)方法和插入假设训练样本(IHTS)处理来平衡训练数据集,开发了一种名为“pLoc_bal-mVirus”的新预测器,用于预测多标签病毒蛋白的亚细胞定位。

结果

在完全相同的经实验确认的数据集上进行的交叉验证测试表明,所提出的新预测器明显优于pLoc-mVirus,即现有的用于相同目的的最先进预测器。

结论

其用户友好的网络服务器可在http://www.jci-bioinfo.cn/pLoc_balmVirus/上获取,大多数实验科学家可以通过该服务器轻松获得他们想要的结果,而无需进行详细复杂的数学运算。因此,pLoc_bal-mVirus将成为设计多靶点药物和深入了解细胞中生物过程的非常有用的工具。

相似文献

1
pLoc_bal-mVirus: Predict Subcellular Localization of Multi-Label Virus Proteins by Chou's General PseAAC and IHTS Treatment to Balance Training Dataset.pLoc_bal-mVirus:基于周式广义伪氨基酸组成和用于平衡训练数据集的迭代启发式阈值选择处理预测多标签病毒蛋白的亚细胞定位
Med Chem. 2019;15(5):496-509. doi: 10.2174/1573406415666181217114710.
2
pLoc_bal-mEuk: Predict Subcellular Localization of Eukaryotic Proteins by General PseAAC and Quasi-balancing Training Dataset.pLoc_bal-mEuk:基于通用伪氨基酸组成和准平衡训练数据集预测真核生物蛋白质的亚细胞定位
Med Chem. 2019;15(5):472-485. doi: 10.2174/1573406415666181218102517.
3
pLoc_bal-mPlant: Predict Subcellular Localization of Plant Proteins by General PseAAC and Balancing Training Dataset.pLoc_bal-mPlant:基于广义 PseAAC 和平衡训练数据集预测植物蛋白的亚细胞定位
Curr Pharm Des. 2018;24(34):4013-4022. doi: 10.2174/1381612824666181119145030.
4
pLoc_bal-mGpos: Predict subcellular localization of Gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC.pLoc_bal-mGpos:通过准平衡训练数据集和 PseAAC 预测革兰氏阳性菌蛋白质的亚细胞定位
Genomics. 2019 Jul;111(4):886-892. doi: 10.1016/j.ygeno.2018.05.017. Epub 2018 May 26.
5
pLoc_bal-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by quasi-balancing training dataset and general PseAAC.pLoc_bal-mGneg:通过准平衡训练数据集和广义 PseAAC 预测革兰氏阴性细菌蛋白质的亚细胞定位。
J Theor Biol. 2018 Dec 7;458:92-102. doi: 10.1016/j.jtbi.2018.09.005. Epub 2018 Sep 8.
6
pLoc_bal-mAnimal: predict subcellular localization of animal proteins by balancing training dataset and PseAAC.pLoc_bal-mAnimal:通过平衡训练数据集和 PseAAC 来预测动物蛋白质的亚细胞定位。
Bioinformatics. 2019 Feb 1;35(3):398-406. doi: 10.1093/bioinformatics/bty628.
7
pLoc-mVirus: Predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general PseAAC.pLoc-mVirus:通过将最优的基因本体(GO)信息整合到通用的伪氨基酸组成(PseAAC)中来预测多定位病毒蛋白的亚细胞定位
Gene. 2017 Sep 10;628:315-321. doi: 10.1016/j.gene.2017.07.036. Epub 2017 Jul 18.
8
pLoc_bal-mHum: Predict subcellular localization of human proteins by PseAAC and quasi-balancing training dataset.pLoc_bal-mHum:通过 PseAAC 和准平衡训练数据集预测人类蛋白质的亚细胞定位。
Genomics. 2019 Dec;111(6):1274-1282. doi: 10.1016/j.ygeno.2018.08.007. Epub 2018 Sep 1.
9
pLoc-mEuk: Predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC.pLoc-mEuk:通过将关键 GO 信息提取到通用 PseAAC 中,预测多标签真核蛋白质的亚细胞定位。
Genomics. 2018 Jan;110(1):50-58. doi: 10.1016/j.ygeno.2017.08.005. Epub 2017 Aug 14.
10
pLoc-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC.pLoc-mGneg:通过基于通用伪氨基酸组成的深度基因本体学习预测革兰氏阴性菌蛋白质的亚细胞定位。
Genomics. 2017 Oct 6. doi: 10.1016/j.ygeno.2017.10.002.

引用本文的文献

1
HormoNet: a deep learning approach for hormone-drug interaction prediction.HormoNet:一种用于激素-药物相互作用预测的深度学习方法。
BMC Bioinformatics. 2024 Feb 28;25(1):87. doi: 10.1186/s12859-024-05708-7.
2
A review from biological mapping to computation-based subcellular localization.从生物图谱到基于计算的亚细胞定位的综述。
Mol Ther Nucleic Acids. 2023 Apr 20;32:507-521. doi: 10.1016/j.omtn.2023.04.015. eCollection 2023 Jun 13.
3
AptaNet as a deep learning approach for aptamer-protein interaction prediction.AptaNet 作为一种深度学习方法,用于适配体-蛋白质相互作用预测。
Sci Rep. 2021 Mar 16;11(1):6074. doi: 10.1038/s41598-021-85629-0.
4
Identify Lysine Neddylation Sites Using Bi-profile Bayes Feature Extraction the Chou's 5-steps Rule and General Pseudo Components.使用双轮廓贝叶斯特征提取、周氏五步法则和广义伪组分鉴定赖氨酸N-乙酰化位点。
Curr Genomics. 2019 Dec;20(8):592-601. doi: 10.2174/1389202921666191223154629.
5
Subcellular location prediction of apoptosis proteins using two novel feature extraction methods based on evolutionary information and LDA.基于进化信息和 LDA 的两种新特征提取方法对凋亡蛋白的亚细胞定位预测
BMC Bioinformatics. 2020 May 24;21(1):212. doi: 10.1186/s12859-020-3539-1.
6
Characterization of the relationship between FLI1 and immune infiltrate level in tumour immune microenvironment for breast cancer.乳腺癌肿瘤免疫微环境中 FLI1 与免疫浸润水平关系的特征分析。
J Cell Mol Med. 2020 May;24(10):5501-5514. doi: 10.1111/jcmm.15205. Epub 2020 Apr 5.
7
iSulfoTyr-PseAAC: Identify Tyrosine Sulfation Sites by Incorporating Statistical Moments Chou's 5-steps Rule and Pseudo Components.iSulfoTyr-PseAAC:通过结合统计矩、周氏五步法则和伪组分来识别酪氨酸硫酸化位点
Curr Genomics. 2019 May;20(4):306-320. doi: 10.2174/1389202920666190819091609.
8
iMethylK_pseAAC: Improving Accuracy of Lysine Methylation Sites Identification by Incorporating Statistical Moments and Position Relative Features into General PseAAC Chou's 5-steps Rule.iMethylK_pseAAC:通过将统计矩和位置相关特征纳入通用伪氨基酸组成的周氏五步法则来提高赖氨酸甲基化位点识别的准确性
Curr Genomics. 2019 May;20(4):275-292. doi: 10.2174/1389202920666190809095206.
9
Some illuminating remarks on molecular genetics and genomics as well as drug development.关于分子遗传学和基因组学以及药物开发的一些有启发性的观点。
Mol Genet Genomics. 2020 Mar;295(2):261-274. doi: 10.1007/s00438-019-01634-z. Epub 2020 Jan 1.
10
RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou's five-step rule.RAACBook:一个基于简化氨基酸字母表的网络服务器,用于通过使用周保罗的五步法则进行序列相关推断。
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz131.