Sjölander K
Molecular Applications Group, Palo Alto, CA 94303-1110, USA.
Proc Int Conf Intell Syst Mol Biol. 1998;6:165-74.
This work focuses on the inference of evolutionary relationships in protein superfamilies, and the uses of these relationships to identify key positions in the structure, to infer attributes on the basis of evolutionary distance, and to identify potential errors in sequence annotations. Relative entropy, a distance metric from information theory, is used in combination with Dirichlet mixture priors to estimate a phylogenetic tree for a set of proteins. This method infers key structural or functional positions in the molecule, and guides the tree topology to preserve these important positions within subtrees. Minimum-description-length principles are used to determine a cut of the tree into subtrees, to identify the subfamilies in the data. This method is demonstrated on SH2-domain containing proteins, resulting in a new subfamily assignment for Src2-drome and a suggested evolutionary relationship between Nck_human and Drk_drome, Sem5_caeel, Grb2_human and Grb2_chick.
这项工作聚焦于蛋白质超家族中进化关系的推断,以及利用这些关系来确定结构中的关键位置、基于进化距离推断属性,和识别序列注释中的潜在错误。相对熵是一种来自信息论的距离度量,它与狄利克雷混合先验相结合,用于估计一组蛋白质的系统发育树。该方法推断分子中的关键结构或功能位置,并引导树的拓扑结构在子树中保留这些重要位置。最小描述长度原则用于确定将树切割成子树,以识别数据中的亚家族。该方法在含SH2结构域的蛋白质上得到了验证,为Src2结构域产生了新的亚家族分类,并提出了人类Nck与果蝇Drk、秀丽隐杆线虫Sem5、人类Grb2和鸡Grb2之间的进化关系。