Suppr超能文献

使用异构集成进行大规模蛋白质功能预测。

Large-scale protein function prediction using heterogeneous ensembles.

作者信息

Wang Linhua, Law Jeffrey, Kale Shiv D, Murali T M, Pandey Gaurav

机构信息

Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.

Genetics, Bioinformatics, and Computational Biology Ph.D. Program, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA.

出版信息

F1000Res. 2018 Sep 28;7. doi: 10.12688/f1000research.16415.1. eCollection 2018.

Abstract

Heterogeneous ensembles are an effective approach in scenarios where the ideal data type and/or individual predictor are unclear for a given problem. These ensembles have shown promise for protein function prediction (PFP), but their ability to improve PFP at a large scale is unclear. The overall goal of this study is to critically assess this ability of a variety of heterogeneous ensemble methods across a multitude of functional terms, proteins and organisms. Our results show that these methods, especially Stacking using Logistic Regression, indeed produce more accurate predictions for a variety of Gene Ontology terms differing in size and specificity. To enable the application of these methods to other related problems, we have publicly shared the HPC-enabled code underlying this work as LargeGOPred ( https://github.com/GauravPandeyLab/LargeGOPred).

摘要

在给定问题中理想数据类型和/或单个预测器不明确的情况下,异构集成是一种有效的方法。这些集成方法在蛋白质功能预测(PFP)方面已显示出前景,但其在大规模上改善PFP的能力尚不清楚。本研究的总体目标是严格评估多种异构集成方法在众多功能术语、蛋白质和生物体中的这种能力。我们的结果表明,这些方法,尤其是使用逻辑回归的堆叠法,确实能对各种大小和特异性不同的基因本体术语产生更准确的预测。为了使这些方法能够应用于其他相关问题,我们已将这项工作所基于的启用HPC的代码作为LargeGOPred(https://github.com/GauravPandeyLab/LargeGOPred)公开发布。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ae5/6221071/3f2f62313651/f1000research-7-17934-g0000.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验