Bernardes Juliana S, Pedreira Carlos E
Federal University of Rio de Janeiro UFRJ, COPPE-Engineering Graduate Program.
Recent Pat Biotechnol. 2013 Aug;7(2):122-41. doi: 10.2174/18722083113079990006.
Protein function prediction is one of the most challenging problems in the post-genomic era. The number of newly identified proteins has been exponentially increasing with the advances of the high-throughput techniques. However, the functional characterization of these new proteins was not incremented in the same proportion. To fill this gap, a large number of computational methods have been proposed in the literature. Early approaches have explored homology relationships to associate known functions to the newly discovered proteins. Nevertheless, these approaches tend to fail when a new protein is considerably different (divergent) from previously known ones. Accordingly, more accurate approaches, that use expressive data representation and explore sophisticate computational techniques are required. Regarding these points, this review provides a comprehensible description of machine learning approaches that are currently applied to protein function prediction problems. We start by defining several problems enrolled in understanding protein function aspects, and describing how machine learning can be applied to these problems. We aim to expose, in a systematical framework, the role of these techniques in protein function inference, sometimes difficult to follow up due to the rapid evolvement of the field. With this purpose in mind, we highlight the most representative contributions, the recent advancements, and provide an insightful categorization and classification of machine learning methods in functional proteomics.
蛋白质功能预测是后基因组时代最具挑战性的问题之一。随着高通量技术的进步,新鉴定出的蛋白质数量呈指数级增长。然而,这些新蛋白质的功能表征并没有以相同的比例增加。为了填补这一空白,文献中提出了大量的计算方法。早期的方法探索了同源关系,以便将已知功能与新发现的蛋白质联系起来。然而,当一种新蛋白质与先前已知的蛋白质有很大差异(分化)时,这些方法往往会失败。因此,需要更精确的方法,这些方法使用富有表现力的数据表示并探索复杂的计算技术。关于这些要点,本综述对目前应用于蛋白质功能预测问题的机器学习方法进行了全面的描述。我们首先定义了在理解蛋白质功能方面涉及的几个问题,并描述了机器学习如何应用于这些问题。我们旨在在一个系统的框架中揭示这些技术在蛋白质功能推断中的作用,由于该领域的快速发展,这些作用有时难以跟进。出于这个目的,我们突出了最具代表性的贡献、最新进展,并对功能蛋白质组学中的机器学习方法进行了有见地的分类。