Department of Biology, University of Waterloo, 200 University Ave. West, Waterloo, ON N2L 3G1, Canada.
Department of Biology, University of Waterloo, 200 University Ave. West, Waterloo, ON N2L 3G1, Canada.
Curr Opin Struct Biol. 2016 Jun;38:53-61. doi: 10.1016/j.sbi.2016.05.017. Epub 2016 Jun 10.
Large-scale sequence and structural data is a goldmine of novel proteins, but how can this data be effectively mined for new functions? Here, we review protein function prediction methods and recent studies that apply these methods to discover new functionality. Core approaches include sequence-based homology detection, phylogenetic analysis, structural bioinformatics, and inference of functional associations using genomic context and related methods. With such a wide range of approaches, sequences may reveal new functionality regardless of their similarity to a characterized reference. Homologs of known function may be identified in unexpected species or associations. Detection of functional shifts in sequences may reveal new activities and specificities. New protein functions may also be predicted in uncharacterized sequences and structures. Finally, methods and data may be integrated and applied at increasingly large scales due to improved protein domain knowledge and structural coverage, which amplifies the ability to predict and discover novel protein functions.
大规模的序列和结构数据是新型蛋白质的宝库,但如何有效地从这些数据中挖掘新功能呢?在这里,我们回顾了蛋白质功能预测方法,以及最近应用这些方法发现新功能的研究。核心方法包括基于序列的同源性检测、系统发生分析、结构生物信息学,以及利用基因组背景和相关方法推断功能关联。有了如此广泛的方法,即使与已鉴定的参考序列没有相似性,序列也可能揭示新的功能。在意想不到的物种或关联中可能会发现具有已知功能的同源物。序列功能转变的检测可能会揭示新的活性和特异性。在未表征的序列和结构中也可以预测新的蛋白质功能。最后,由于蛋白质结构域知识和结构覆盖率的提高,方法和数据可以在更大的范围内进行整合和应用,从而增强了预测和发现新型蛋白质功能的能力。