Suppr超能文献

基于简化投票的蛋白质亚细胞定位元预测

Meta-prediction of protein subcellular localization with reduced voting.

作者信息

Liu Jie, Kang Shuli, Tang Chuanning, Ellis Lynda B M, Li Tongbin

机构信息

Department of Neuroscience, University of Minneapolis, MN 55455, USA.

出版信息

Nucleic Acids Res. 2007;35(15):e96. doi: 10.1093/nar/gkm562. Epub 2007 Aug 1.

Abstract

Meta-prediction seeks to harness the combined strengths of multiple predicting programs with the hope of achieving predicting performance surpassing that of all existing predictors in a defined problem domain. We investigated meta-prediction for the four-compartment eukaryotic subcellular localization problem. We compiled an unbiased subcellular localization dataset of 1693 nuclear, cytoplasmic, mitochondrial and extracellular animal proteins from Swiss-Prot 50.2. Using this dataset, we assessed the predicting performance of 12 predictors from eight independent subcellular localization predicting programs: ELSPred, LOCtree, PLOC, Proteome Analyst, PSORT, PSORT II, SubLoc and WoLF PSORT. Gorodkin correlation coefficient (GCC) was one of the performance measures. Proteome Analyst is the best individual subcellular localization predictor tested in this four-compartment prediction problem, with GCC = 0.811. A reduced voting strategy eliminating six of the 12 predictors yields a meta-predictor (RAW-RAG-6) with GCC = 0.856, substantially better than all tested individual subcellular localization predictors (P = 8.2 x 10(-6), Fisher's Z-transformation test). The improvement in performance persists when the meta-predictor is tested with data not used in its development. This and similar voting strategies, when properly applied, are expected to produce meta-predictors with outstanding performance in other life sciences problem domains.

摘要

元预测旨在利用多个预测程序的综合优势,以期在特定问题领域实现超越所有现有预测器的预测性能。我们研究了针对四分类真核亚细胞定位问题的元预测。我们从Swiss-Prot 50.2中汇编了一个包含1693种动物细胞核、细胞质、线粒体和细胞外蛋白的无偏亚细胞定位数据集。利用该数据集,我们评估了来自八个独立亚细胞定位预测程序(ELSPred、LOCtree、PLOC、蛋白质组分析软件、PSORT、PSORT II、SubLoc和WoLF PSORT)的12个预测器的预测性能。戈罗德金相关系数(GCC)是性能指标之一。在这个四分类预测问题中,蛋白质组分析软件是测试的最佳单个亚细胞定位预测器,GCC = 0.811。一种减少投票策略,剔除12个预测器中的6个,得到一个元预测器(RAW-RAG-6),其GCC = 0.856,显著优于所有测试的单个亚细胞定位预测器(P = 8.2×10⁻⁶,费舍尔Z变换检验)。当用其开发过程中未使用的数据对元预测器进行测试时,性能提升仍然存在。这种以及类似的投票策略,若恰当应用,有望在其他生命科学问题领域产生具有出色性能的元预测器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b8f/1976432/f2611e5aba2e/gkm562f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验