• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从识别准确率和假发现率的角度评估蛋白质功能预测算法的性能。

Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate.

机构信息

Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 401331, China.

Innovative Drug Research and Bioinformatics Group, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.

出版信息

Int J Mol Sci. 2018 Jan 8;19(1):183. doi: 10.3390/ijms19010183.

DOI:10.3390/ijms19010183
PMID:29316706
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5796132/
Abstract

The function of a protein is of great interest in the cutting-edge research of biological mechanisms, disease development and drug/target discovery. Besides experimental explorations, a variety of computational methods have been designed to predict protein function. Among these in silico methods, the prediction of BLAST is based on protein sequence similarity, while that of machine learning is also based on the sequence, but without the consideration of their similarity. This unique characteristic of machine learning makes it a good complement to BLAST and many other approaches in predicting the function of remotely relevant proteins and the homologous proteins of distinct function. However, the identification accuracies of these in silico methods and their false discovery rate have not yet been assessed so far, which greatly limits the usage of these algorithms. Herein, a comprehensive comparison of the performances among four popular prediction algorithms (BLAST, SVM, PNN and KNN) was conducted. In particular, the performance of these methods was systematically assessed by four standard statistical indexes based on the independent test datasets of 93 functional protein families defined by UniProtKB keywords. Moreover, the false discovery rates of these algorithms were evaluated by scanning the genomes of four representative model organisms (, , and ). As a result, the substantially higher sensitivity of SVM and BLAST was observed compared with that of PNN and KNN. However, the machine learning algorithms (PNN, KNN and SVM) were found capable of substantially reducing the false discovery rate (SVM < PNN < KNN). In sum, this study comprehensively assessed the performance of four popular algorithms applied to protein function prediction, which could facilitate the selection of the most appropriate method in the related biomedical research.

摘要

蛋白质的功能是生物机制、疾病发展和药物/靶点发现的前沿研究中非常感兴趣的问题。除了实验探索外,还设计了各种计算方法来预测蛋白质功能。在这些计算方法中,BLAST 的预测基于蛋白质序列相似性,而机器学习的预测也是基于序列,但不考虑它们的相似性。机器学习的这个独特特征使其成为 BLAST 和许多其他方法的良好补充,可以预测远程相关蛋白质和功能不同的同源蛋白质的功能。然而,到目前为止,这些计算方法的识别精度及其假发现率尚未得到评估,这极大地限制了这些算法的使用。在此,我们对四种流行的预测算法(BLAST、SVM、PNN 和 KNN)的性能进行了全面比较。特别是,基于 UniProtKB 关键字定义的 93 个功能蛋白家族的独立测试数据集,使用四个标准统计指标系统地评估了这些方法的性能。此外,还通过扫描四个代表性模式生物(、、和)的基因组来评估这些算法的假发现率。结果表明,SVM 和 BLAST 的灵敏度明显高于 PNN 和 KNN。然而,机器学习算法(PNN、KNN 和 SVM)被发现能够大大降低假发现率(SVM < PNN < KNN)。总之,本研究全面评估了四种流行算法在蛋白质功能预测中的性能,这有助于在相关的生物医学研究中选择最合适的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/67a1/5796132/21cbf2c5e556/ijms-19-00183-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/67a1/5796132/f5aca190b6ae/ijms-19-00183-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/67a1/5796132/3a4424475d40/ijms-19-00183-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/67a1/5796132/21cbf2c5e556/ijms-19-00183-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/67a1/5796132/f5aca190b6ae/ijms-19-00183-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/67a1/5796132/3a4424475d40/ijms-19-00183-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/67a1/5796132/21cbf2c5e556/ijms-19-00183-g003.jpg

相似文献

1
Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate.从识别准确率和假发现率的角度评估蛋白质功能预测算法的性能。
Int J Mol Sci. 2018 Jan 8;19(1):183. doi: 10.3390/ijms19010183.
2
RVMAB: Using the Relevance Vector Machine Model Combined with Average Blocks to Predict the Interactions of Proteins from Protein Sequences.RVMAB:使用相关向量机模型结合平均块从蛋白质序列预测蛋白质相互作用
Int J Mol Sci. 2016 May 18;17(5):757. doi: 10.3390/ijms17050757.
3
A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.一种基于机器学习的方法,利用氨基酸组成、顺序和相似性搜索来预测分泌蛋白。
In Silico Biol. 2008;8(2):129-40.
4
FGsub: Fusarium graminearum protein subcellular localizations predicted from primary structures.FGsub:根据一级结构预测的禾谷镰刀菌蛋白质亚细胞定位
BMC Syst Biol. 2010 Sep 13;4 Suppl 2(Suppl 2):S12. doi: 10.1186/1752-0509-4-S2-S12.
5
Effect of training datasets on support vector machine prediction of protein-protein interactions.训练数据集对蛋白质-蛋白质相互作用支持向量机预测的影响。
Proteomics. 2005 Mar;5(4):876-84. doi: 10.1002/pmic.200401118.
6
Identification of Heat Shock Protein families and J-protein types by incorporating Dipeptide Composition into Chou's general PseAAC.通过将二肽组成纳入周的通用 PseAAC,鉴定热休克蛋白家族和 J 蛋白类型。
Comput Methods Programs Biomed. 2015 Nov;122(2):165-74. doi: 10.1016/j.cmpb.2015.07.005. Epub 2015 Jul 22.
7
SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection.SVM-HUSTLE——一种用于成对蛋白质远程同源性检测的迭代半监督机器学习方法。
Bioinformatics. 2008 Mar 15;24(6):783-90. doi: 10.1093/bioinformatics/btn028. Epub 2008 Feb 1.
8
The Classification of Rice Blast Resistant Seed Based on Ranman Spectroscopy and SVM.基于 Raman 光谱和支持向量机的水稻抗瘟种子分类。
Molecules. 2022 Jun 25;27(13):4091. doi: 10.3390/molecules27134091.
9
Evaluating the performance of sequence encoding schemes and machine learning methods for splice sites recognition.评估序列编码方案和机器学习方法在剪接位点识别中的性能。
Gene. 2019 Jul 15;705:113-126. doi: 10.1016/j.gene.2019.04.047. Epub 2019 Apr 19.
10
Using machine learning to realize genetic site screening and genomic prediction of productive traits in pigs.利用机器学习实现猪生产性状的遗传位点筛选和基因组预测。
FASEB J. 2023 Jun;37(6):e22961. doi: 10.1096/fj.202300245R.

引用本文的文献

1
AnnoPRO: a strategy for protein function annotation based on multi-scale protein representation and a hybrid deep learning of dual-path encoding.AnnoPRO:一种基于多尺度蛋白质表示和双通道编码混合深度学习的蛋白质功能注释策略。
Genome Biol. 2024 Feb 1;25(1):41. doi: 10.1186/s13059-024-03166-1.
2
A novel riboswitch classification based on imbalanced sequences achieved by machine learning.基于机器学习实现的不平衡序列的新型核糖体开关分类。
PLoS Comput Biol. 2020 Jul 20;16(7):e1007760. doi: 10.1371/journal.pcbi.1007760. eCollection 2020 Jul.
3
Photosynthetic protein classification using genome neighborhood-based machine learning feature.

本文引用的文献

1
SkipCPP-Pred: an improved and promising sequence-based predictor for predicting cell-penetrating peptides.SkipCPP-Pred:一种改进的、有前途的基于序列的细胞穿透肽预测器。
BMC Genomics. 2017 Oct 16;18(Suppl 7):742. doi: 10.1186/s12864-017-4128-1.
2
Therapeutic target database update 2018: enriched resource for facilitating bench-to-clinic research of targeted therapeutics.治疗靶点数据库更新 2018:丰富资源,促进靶向治疗的基础到临床研究。
Nucleic Acids Res. 2018 Jan 4;46(D1):D1121-D1127. doi: 10.1093/nar/gkx1076.
3
iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier.
基于基因组邻域的机器学习特征进行光合作用蛋白分类。
Sci Rep. 2020 Apr 28;10(1):7108. doi: 10.1038/s41598-020-64053-w.
4
Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning.基于序列的深度学习同时提高稳定性、准确性和假阳性率的蛋白质功能注释。
Brief Bioinform. 2020 Jul 15;21(4):1437-1447. doi: 10.1093/bib/bbz081.
5
Identification of the gene signature reflecting schizophrenia's etiology by constructing artificial intelligence-based method of enhanced reproducibility.通过构建基于人工智能的增强可重复性方法,识别反映精神分裂症病因的基因特征。
CNS Neurosci Ther. 2019 Sep;25(9):1054-1063. doi: 10.1111/cns.13196. Epub 2019 Jul 27.
6
iProEP: A Computational Predictor for Predicting Promoter.iProEP:一种用于预测启动子的计算预测工具。
Mol Ther Nucleic Acids. 2019 Sep 6;17:337-346. doi: 10.1016/j.omtn.2019.05.028. Epub 2019 Jun 13.
7
Assessing the Effectiveness of Direct Data Merging Strategy in Long-Term and Large-Scale Pharmacometabonomics.评估直接数据合并策略在长期和大规模药物代谢组学中的有效性。
Front Pharmacol. 2019 Feb 20;10:127. doi: 10.3389/fphar.2019.00127. eCollection 2019.
8
Biomarker Discovery for Immunotherapy of Pituitary Adenomas: Enhanced Robustness and Prediction Ability by Modern Computational Tools.基于现代计算工具的垂体腺瘤免疫治疗的生物标志物发现:增强稳健性和预测能力。
Int J Mol Sci. 2019 Jan 3;20(1):151. doi: 10.3390/ijms20010151.
9
Prediction of Signal Peptides in Proteins from Malaria Parasites.蛋白质信号肽的预测。
Int J Mol Sci. 2018 Nov 22;19(12):3709. doi: 10.3390/ijms19123709.
10
Prediction of GluN2B-CT/DAPK1 Interaction by Protein⁻Peptide Docking and Molecular Dynamics Simulation.通过蛋白-肽对接和分子动力学模拟预测 GluN2B-CT/DAPK1 相互作用。
Molecules. 2018 Nov 19;23(11):3018. doi: 10.3390/molecules23113018.
iKcr-PseEns:使用伪组件和集成分类器鉴定组蛋白中的赖氨酸巴豆酰化位点。
Genomics. 2018 Sep;110(5):239-246. doi: 10.1016/j.ygeno.2017.10.008. Epub 2017 Nov 17.
4
iDNAProt-ES: Identification of DNA-binding Proteins Using Evolutionary and Structural Features.iDNAProt-ES:利用进化和结构特征鉴定 DNA 结合蛋白。
Sci Rep. 2017 Nov 2;7(1):14938. doi: 10.1038/s41598-017-14945-1.
5
Predict protein structural class by incorporating two different modes of evolutionary information into Chou's general pseudo amino acid composition.通过将两种不同模式的进化信息整合到周氏广义伪氨基酸组成中预测蛋白质结构类别。
J Mol Graph Model. 2017 Nov;78:110-117. doi: 10.1016/j.jmgm.2017.10.003. Epub 2017 Oct 7.
6
pLoc-mAnimal: predict subcellular localization of animal proteins with both single and multiple sites.pLoc-mAnimal:预测具有单一位点和多个位点的动物蛋白质的亚细胞定位。
Bioinformatics. 2017 Nov 15;33(22):3524-3531. doi: 10.1093/bioinformatics/btx476.
7
Beat-to-beat P-wave morphology as a predictor of paroxysmal atrial fibrillation.逐搏P波形态作为阵发性心房颤动的预测指标
Comput Methods Programs Biomed. 2017 Nov;151:111-121. doi: 10.1016/j.cmpb.2017.08.016. Epub 2017 Aug 24.
8
Enzyme classification using multiclass support vector machine and feature subset selection.使用多类支持向量机和特征子集选择进行酶分类。
Comput Biol Chem. 2017 Oct;70:211-219. doi: 10.1016/j.compbiolchem.2017.08.009. Epub 2017 Aug 31.
9
pLoc-mEuk: Predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC.pLoc-mEuk:通过将关键 GO 信息提取到通用 PseAAC 中,预测多标签真核蛋白质的亚细胞定位。
Genomics. 2018 Jan;110(1):50-58. doi: 10.1016/j.ygeno.2017.08.005. Epub 2017 Aug 14.
10
HPSLPred: An Ensemble Multi-Label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source.HPSLPred:一种用于人类蛋白质亚细胞定位预测的集成多标签分类器,源数据不均衡。
Proteomics. 2017 Sep;17(17-18). doi: 10.1002/pmic.201700262.