Dunn S D, Wahl L M, Gloor G B
Department of Biochemistry, University of Western Ontario, London, Ontario, Canada, N6A 5C1.
Bioinformatics. 2008 Feb 1;24(3):333-40. doi: 10.1093/bioinformatics/btm604. Epub 2007 Dec 5.
Compensating alterations during the evolution of protein families give rise to coevolving positions that contain important structural and functional information. However, a high background composed of random noise and phylogenetic components interferes with the identification of coevolving positions.
We have developed a rapid, simple and general method based on information theory that accurately estimates the level of background mutual information for each pair of positions in a given protein family. Removal of this background results in a metric, MIp, that correctly identifies substantially more coevolving positions in protein families than any existing method. A significant fraction of these positions coevolve strongly with one or only a few positions. The vast majority of such position pairs are in contact in representative structures. The identification of strongly coevolving position pairs can be used to impose significant structural limitations and should be an important additional constraint for ab initio protein folding.
Alignments and program files can be found in the Supplementary Information.
蛋白质家族进化过程中的补偿性变化会产生共同进化的位点,这些位点包含重要的结构和功能信息。然而,由随机噪声和系统发育成分组成的高背景干扰了共同进化位点的识别。
我们基于信息论开发了一种快速、简单且通用的方法,该方法能准确估计给定蛋白质家族中每对位点的背景互信息水平。去除这种背景后得到一个指标MIp,与任何现有方法相比,它能正确识别出蛋白质家族中更多的共同进化位点。这些位点中有很大一部分与一个或仅少数几个位点强烈共同进化。绝大多数这样的位点对在代表性结构中是相互接触的。强烈共同进化的位点对的识别可用于施加重要的结构限制,并且应该是从头蛋白质折叠的一个重要附加约束。
比对和程序文件可在补充信息中找到。