蛋白质家族中残基偶联的图形模型。

Graphical models of residue coupling in protein families.

作者信息

Thomas John, Ramakrishnan Naren, Bailey-Kellogg Chris

机构信息

Department of Computer Science, Dartmouth College, Sudikoff Laboratory, Hanover, NH 03755, USA.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2008 Apr-Jun;5(2):183-97. doi: 10.1109/TCBB.2007.70225.

DOI:10.1109/TCBB.2007.70225

PMID:18451428

Abstract

Many statistical measures and algorithmic techniques have been proposed for studying residue coupling in protein families. Generally speaking, two residue positions are considered coupled if, in the sequence record, some of their amino acid type combinations are significantly more common than others. While the proposed approaches have proven useful in finding and describing coupling, a significant missing component is a formal probabilistic model that explicates and compactly represents the coupling, integrates information about sequence,structure, and function, and supports inferential procedures for analysis, diagnosis, and prediction.We present an approach to learning and using probabilistic graphical models of residue coupling. These models capture significant conservation and coupling constraints observable ina multiply-aligned set of sequences. Our approach can place a structural prior on considered couplings, so that all identified relationships have direct mechanistic explanations. It can also incorporate information about functional classes, and thereby learn a differential graphical model that distinguishes constraints common to all classes from those unique to individual classes. Such differential models separately account for class-specific conservation and family-wide coupling, two different sources of sequence covariation. They are then able to perform interpretable functional classification of new sequences, explaining classification decisions in terms of the underlying conservation and coupling constraints. We apply our approach in studies of both G protein-coupled receptors and PDZ domains, identifying and analyzing family-wide and class-specific constraints, and performing functional classification. The results demonstrate that graphical models of residue coupling provide a powerful tool for uncovering, representing, and utilizing significant sequence structure-function relationships in protein families.

摘要

已经提出了许多统计方法和算法技术来研究蛋白质家族中的残基偶联。一般来说，如果在序列记录中，它们的某些氨基酸类型组合比其他组合明显更常见，那么两个残基位置就被认为是偶联的。虽然所提出的方法已被证明在发现和描述偶联方面很有用，但一个重要的缺失部分是一个形式化的概率模型，该模型能够解释并紧凑地表示偶联，整合有关序列、结构和功能的信息，并支持用于分析、诊断和预测的推理程序。我们提出了一种学习和使用残基偶联概率图形模型的方法。这些模型捕捉了在多序列比对集合中可观察到的显著保守性和偶联约束。我们的方法可以对考虑的偶联施加结构先验，以便所有确定的关系都有直接的机制解释。它还可以纳入有关功能类别的信息，从而学习一种差异图形模型，该模型将所有类别共有的约束与个别类别特有的约束区分开来。这种差异模型分别考虑了特定类别的保守性和全家族的偶联，这是序列共变的两种不同来源。然后，它们能够对新序列进行可解释的功能分类，根据潜在的保守性和偶联约束来解释分类决策。我们将我们的方法应用于G蛋白偶联受体和PDZ结构域的研究中，识别和分析全家族和特定类别的约束，并进行功能分类。结果表明，残基偶联的图形模型为揭示、表示和利用蛋白质家族中显著的序列结构-功能关系提供了一个强大的工具。