蛋白质序列比较和折叠识别：进展和良好实践基准测试。

Protein sequence comparison and fold recognition: progress and good-practice benchmarking.

机构信息

Gene Center and Center for Integrated Protein Science, Ludwig-Maximilians-Universität München, Feodor-Lynen-Strasse 25, Munich, Germany.

出版信息

Curr Opin Struct Biol. 2011 Jun;21(3):404-11. doi: 10.1016/j.sbi.2011.03.005. Epub 2011 Mar 31.

DOI:10.1016/j.sbi.2011.03.005

PMID:21458982

Abstract

Protein sequence comparison methods have grown increasingly sensitive during the last decade and can often identify distantly related proteins sharing a common ancestor some 3 billion years ago. Although cellular function is not conserved so long, molecular functions and structures of protein domains often are. In combination with a domain-centered approach to function and structure prediction, modern remote homology detection methods have a great and largely underexploited potential for elucidating protein functions and evolution. Advances during the last few years include nonlinear scoring functions combining various sequence features, the use of sequence context information, and powerful new software packages. Since progress depends on realistically assessing new and existing methods and published benchmarks are often hard to compare, we propose 10 rules of good-practice benchmarking.

摘要

在过去的十年中，蛋白质序列比较方法的灵敏度不断提高，通常能够识别出具有共同祖先的远缘蛋白质，这个祖先可以追溯到 30 亿年前。尽管如此长的时间过去了，细胞功能并没有得到保存，但蛋白质结构域的分子功能和结构通常是保存的。结合以结构域为中心的功能和结构预测方法，现代远程同源检测方法在阐明蛋白质功能和进化方面具有巨大的、尚未得到充分利用的潜力。在过去几年中取得的进展包括结合各种序列特征的非线性评分函数、序列上下文信息的使用以及强大的新软件包。由于进展取决于对新方法和现有方法进行现实评估，并且发布的基准通常难以比较，因此我们提出了 10 条良好实践基准测试规则。