Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, GA 30318, USA.
Proteins. 2010 Jan;78(1):118-34. doi: 10.1002/prot.22566.
To exploit the vast amount of sequence information provided by the Genomic revolution, the biological function of these sequences must be identified. As a practical matter, this is often accomplished by functional inference. Purely sequence-based approaches, particularly in the "twilight zone" of low sequence similarity levels, are complicated by many factors. For proteins, structure-based techniques aim to overcome these problems; however, most require high-quality crystal structures and suffer from complex and equivocal relations between protein fold and function. In this study, in extensive benchmarking, we consider a number of aspects of structure-based functional annotation: binding pocket detection, molecular function assignment and ligand-based virtual screening. We demonstrate that protein threading driven by a strong sequence profile component greatly improves the quality of purely structure-based functional annotation in the "twilight zone." By detecting evolutionarily related proteins, it considerably reduces the high false positive rate of function inference derived on the basis of global structure similarity alone. Combined evolution/structure-based function assignment emerges as a powerful technique that can make a significant contribution to comprehensive proteome annotation.
为了充分利用基因组革命所提供的大量序列信息,必须确定这些序列的生物学功能。实际上,这通常是通过功能推断来完成的。纯粹基于序列的方法,特别是在序列相似性水平较低的“ twilight zone”中,受到许多因素的影响。对于蛋白质,基于结构的技术旨在克服这些问题;但是,大多数方法都需要高质量的晶体结构,并且在蛋白质折叠和功能之间存在复杂而模糊的关系。在这项研究中,我们在广泛的基准测试中考虑了基于结构的功能注释的多个方面:结合口袋检测,分子功能分配和基于配体的虚拟筛选。我们证明,由强大的序列轮廓组成的蛋白质穿线大大提高了“ twilight zone”中纯基于结构的功能注释的质量。通过检测进化上相关的蛋白质,它大大降低了仅基于全局结构相似性得出的功能推断的高假阳性率。基于进化/结构的功能分配的组合成为一种强大的技术,可以为全面的蛋白质组注释做出重大贡献。