Delft Bioinformatics Lab, Delft University of Technology, 2628XE Delft, The Netherlands.
Keygene N.V., 6708PW Wageningen, The Netherlands.
Genes (Basel). 2020 Oct 27;11(11):1264. doi: 10.3390/genes11111264.
The current rate at which new DNA and protein sequences are being generated is too fast to experimentally discover the functions of those sequences, emphasizing the need for accurate Automatic Function Prediction (AFP) methods. AFP has been an active and growing research field for decades and has made considerable progress in that time. However, it is certainly not solved. In this paper, we describe challenges that the AFP field still has to overcome in the future to increase its applicability. The challenges we consider are how to: (1) include condition-specific functional annotation, (2) predict functions for non-model species, (3) include new informative data sources, (4) deal with the biases of Gene Ontology (GO) annotations, and (5) maximally exploit the GO to obtain performance gains. We also provide recommendations for addressing those challenges, by adapting (1) the way we represent proteins and genes, (2) the way we represent gene functions, and (3) the algorithms that perform the prediction from gene to function. Together, we show that AFP is still a vibrant research area that can benefit from continuing advances in machine learning with which AFP in the 2020s can again take a large step forward reinforcing the power of computational biology.
目前,新的 DNA 和蛋白质序列的生成速度太快,以至于无法通过实验发现这些序列的功能,这强调了需要准确的自动功能预测 (AFP) 方法。几十年来,AFP 一直是一个活跃且不断发展的研究领域,并在这段时间取得了相当大的进展。然而,它肯定还没有解决。在本文中,我们描述了 AFP 领域未来仍需克服的挑战,以提高其适用性。我们认为的挑战包括如何:(1)包含特定条件的功能注释,(2)预测非模式物种的功能,(3)包含新的信息数据来源,(4)处理基因本体论 (GO) 注释的偏差,以及(5)最大限度地利用 GO 以获得性能提升。我们还通过适应(1)我们表示蛋白质和基因的方式,(2)我们表示基因功能的方式,以及(3)执行从基因到功能预测的算法,为解决这些挑战提供了建议。总之,我们表明,AFP 仍然是一个充满活力的研究领域,可以从机器学习的持续进步中受益,在 2020 年代,AFP 可以再次迈出一大步,增强计算生物学的力量。