• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MSLVP:使用支持向量机预测病毒蛋白的多个亚细胞定位

MSLVP: prediction of multiple subcellular localization of viral proteins using a support vector machine.

作者信息

Thakur Anamika, Rajput Akanksha, Kumar Manoj

机构信息

Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research, Sector 39-A, Chandigarh-160036, India.

出版信息

Mol Biosyst. 2016 Jul 19;12(8):2572-86. doi: 10.1039/c6mb00241b.

DOI:10.1039/c6mb00241b
PMID:27272007
Abstract

Knowledge of the subcellular location (SCL) of viral proteins in the host cell is important for understanding their function in depth. Therefore, we have developed "MSLVP", a two-tier prediction algorithm for predicting multiple SCLs of viral proteins. For this study, data sets of comprehensive viral proteins with experimentally validated SCL annotation were collected from UniProt. Non-redundant (90%) data sets of 3480 viral proteins that belonged to single (2715), double (391) and multiple (374) sites were employed. Additionally, 1687 (30% sequence identity) viral proteins were categorised into single (1366), double (167) and multiple (154) sites. Single, double and multiple locations further comprised of eight, four and six categories, respectively. Viral protein locations include the nucleus, cytoplasm, endoplasmic reticulum, extracellular, single-pass membrane, multi-pass membrane, capsid, remaining others and combinations thereof. Support vector machine based models were developed using sequence features like amino acid composition, dipeptide composition, physicochemical properties and their hybrids. We have employed "one-versus-one" as well as "one-versus-other" strategies for multiclass classification. The performance of "one-versus-one" is better than the "one-versus-other" approach during 10-fold cross-validation. For the 90% data set, we achieved an accuracy, a Matthew's correlation coefficient (MCC) and a receiver operating characteristic (ROC) of 99.99%, 1.00, 1.00; 100.00%, 1.00, 1.00 and 99.90%; 1.00, 1.00 for single, double and multiple locations, respectively. Similar results were achieved for a 30% sequence identity data set. Predictive models for each SCL performed equally well on the independent dataset. The MSLVP web server () can predict subcellular locations i.e. single (8; including single and multi-pass membrane), double (4) and multiple (6). This would be helpful for elucidating the functional annotation of viral proteins and potential drug targets.

摘要

了解病毒蛋白在宿主细胞中的亚细胞定位(SCL)对于深入理解其功能至关重要。因此,我们开发了“MSLVP”,一种用于预测病毒蛋白多个SCL的两层预测算法。在本研究中,从UniProt收集了具有经实验验证的SCL注释的综合病毒蛋白数据集。使用了属于单一位点(2715个)、双位点(391个)和多位点(374个)的3480个病毒蛋白的非冗余(90%)数据集。此外,1687个(序列同一性为30%)病毒蛋白被分类为单一位点(1366个)、双位点(167个)和多位点(154个)。单一位点、双位点和多位点进一步分别由八类、四类和六类组成。病毒蛋白定位包括细胞核、细胞质、内质网、细胞外、单次跨膜、多次跨膜、衣壳、其余其他部位及其组合。使用氨基酸组成、二肽组成、理化性质及其混合等序列特征开发了基于支持向量机的模型。我们采用了“一对一”以及“一对其他”策略进行多类分类。在10倍交叉验证期间,“一对一”的性能优于“一对其他”方法。对于90%的数据集,我们在单一位点、双位点和多位点上分别实现了99.99%、1.00、1.00的准确率、马修斯相关系数(MCC)和受试者工作特征(ROC);100.00%、1.00、1.00以及99.90%、1.00、1.00。对于30%序列同一性的数据集也获得了类似结果。每个SCL的预测模型在独立数据集上表现同样良好。MSLVP网络服务器()可以预测亚细胞定位,即单一位点(8种;包括单次和多次跨膜)、双位点(4种)和多位点(6种)。这将有助于阐明病毒蛋白的功能注释和潜在药物靶点。

相似文献

1
MSLVP: prediction of multiple subcellular localization of viral proteins using a support vector machine.MSLVP:使用支持向量机预测病毒蛋白的多个亚细胞定位
Mol Biosyst. 2016 Jul 19;12(8):2572-86. doi: 10.1039/c6mb00241b.
2
Protein subcellular localization prediction using multiple kernel learning based support vector machine.基于多核学习支持向量机的蛋白质亚细胞定位预测
Mol Biosyst. 2017 Mar 28;13(4):785-795. doi: 10.1039/c6mb00860g.
3
Predicting viral protein subcellular localization with Chou's pseudo amino acid composition and imbalance-weighted multi-label K-nearest neighbor algorithm.利用周氏伪氨基酸组成和不平衡加权多标签K近邻算法预测病毒蛋白亚细胞定位
Protein Pept Lett. 2012 Nov;19(11):1163-9. doi: 10.2174/092986612803216999.
4
Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing.基于概率潜在语义索引的核转位信号预测核蛋白。
BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S13. doi: 10.1186/1471-2105-13-S17-S13. Epub 2012 Dec 13.
5
Virus-ECC-mPLoc: a multi-label predictor for predicting the subcellular localization of virus proteins with both single and multiple sites based on a general form of Chou's pseudo amino acid composition.病毒-ECC-mPLoc:一种基于周氏伪氨基酸组成的通用形式,用于预测具有单一位点和多个位点的病毒蛋白亚细胞定位的多标签预测器。
Protein Pept Lett. 2013 Mar;20(3):309-17. doi: 10.2174/0929866511320030009.
6
pLoc-mVirus: Predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general PseAAC.pLoc-mVirus:通过将最优的基因本体(GO)信息整合到通用的伪氨基酸组成(PseAAC)中来预测多定位病毒蛋白的亚细胞定位
Gene. 2017 Sep 10;628:315-321. doi: 10.1016/j.gene.2017.07.036. Epub 2017 Jul 18.
7
Prediction of protein subcellular localization.蛋白质亚细胞定位预测
Proteins. 2006 Aug 15;64(3):643-51. doi: 10.1002/prot.21018.
8
iLoc-Virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites.iLoc-Virus:一种多标签学习分类器,用于识别具有单个和多个位置的病毒蛋白的亚细胞定位。
J Theor Biol. 2011 Sep 7;284(1):42-51. doi: 10.1016/j.jtbi.2011.06.005. Epub 2011 Jun 17.
9
mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines.mGOASVM:基于基因本体和支持向量机的多标签蛋白质亚细胞定位。
BMC Bioinformatics. 2012 Nov 6;13:290. doi: 10.1186/1471-2105-13-290.
10
PredHydroxy: computational prediction of protein hydroxylation site locations based on the primary structure.PredHydroxy:基于一级结构的蛋白质羟基化位点位置的计算预测。
Mol Biosyst. 2015 Mar;11(3):819-25. doi: 10.1039/c4mb00646a. Epub 2014 Dec 23.

引用本文的文献

1
VirusHound-I: prediction of viral proteins involved in the evasion of host adaptive immune response using the random forest algorithm and generative adversarial network for data augmentation.VirusHound-I:使用随机森林算法和生成对抗网络进行数据增强来预测逃避宿主适应性免疫反应的病毒蛋白。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad434.
2
Ion-pumping microbial rhodopsin protein classification by machine learning approach.基于机器学习方法的离子泵微生物视紫红质蛋白分类。
BMC Bioinformatics. 2023 Jan 27;24(1):29. doi: 10.1186/s12859-023-05138-x.
3
Computational Structural and Functional Analyses of ORF10 in Novel Coronavirus SARS-CoV-2 Variants to Understand Evolutionary Dynamics.
新型冠状病毒SARS-CoV-2变体中ORF10的计算结构和功能分析以了解进化动态
Evol Bioinform Online. 2022 Jul 7;18:11769343221108218. doi: 10.1177/11769343221108218. eCollection 2022.
4
Anti-Ebola: an initiative to predict Ebola virus inhibitors through machine learning.抗埃博拉:通过机器学习预测埃博拉病毒抑制剂的研究计划。
Mol Divers. 2022 Jun;26(3):1635-1644. doi: 10.1007/s11030-021-10291-7. Epub 2021 Aug 6.
5
Identification of Proteins of Tobacco Mosaic Virus by Using a Method of Feature Extraction.利用特征提取方法鉴定烟草花叶病毒的蛋白质
Front Genet. 2020 Oct 9;11:569100. doi: 10.3389/fgene.2020.569100. eCollection 2020.
6
ASFVdb: an integrative resource for genomic and proteomic analyses of African swine fever virus.ASFVdb:一个用于非洲猪瘟病毒基因组和蛋白质组分析的综合资源。
Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa023.
7
Anti-flavi: A Web Platform to Predict Inhibitors of Using QSAR and Peptidomimetic Approaches.Anti-flavi:一个使用定量构效关系(QSAR)和拟肽方法预测抑制剂的网络平台。
Front Microbiol. 2018 Dec 18;9:3121. doi: 10.3389/fmicb.2018.03121. eCollection 2018.
8
Protein subnuclear localization based on a new effective representation and intelligent kernel linear discriminant analysis by dichotomous greedy genetic algorithm.基于二项式贪婪遗传算法的新有效表示和智能核线性判别分析的蛋白质亚核定位。
PLoS One. 2018 Apr 12;13(4):e0195636. doi: 10.1371/journal.pone.0195636. eCollection 2018.
9
In silico analyses of conservational, functional and phylogenetic distribution of the LuxI and LuxR homologs in Gram-positive bacteria.基于计算机的革兰氏阳性菌中 LuxI 和 LuxR 同源物的保守性、功能性和系统发生分布的分析。
Sci Rep. 2017 Aug 1;7(1):6969. doi: 10.1038/s41598-017-07241-5.