Livesay Dennis R, Kidd Patrick D, Eskandari Sepehr, Roshan Usman
Department of Computer Science and Bioinformatics Research Center, University of North Carolina at Charlotte, Charlotte, NC 28262, USA.
BMC Bioinformatics. 2007 Oct 17;8:397. doi: 10.1186/1471-2105-8-397.
Efforts to predict functional sites from globular proteins is increasingly common; however, the most successful of these methods generally require structural insight. Unfortunately, despite several recent technological advances, structural coverage of membrane integral proteins continues to be sparse. ConSequently, sequence-based methods represent an important alternative to illuminate functional roles. In this report, we critically examine the ability of several computational methods to provide functional insight within two specific areas. First, can phylogenomic methods accurately describe the functional diversity across a membrane integral protein family? And second, can sequence-based strategies accurately predict key functional sites? Due to the presence of a recently solved structure and a vast amount of experimental mutagenesis data, the neurotransmitter/Na+ symporter (NSS) family is an ideal model system to assess the quality of our predictions.
The raw NSS sequence dataset contains 181 sequences, which have been aligned by various methods. The resultant phylogenetic trees always contain six major subfamilies are consistent with the functional diversity across the family. Moreover, in well-represented subfamilies, phylogenetic clustering recapitulates several nuanced functional distinctions. Functional sites are predicted using six different methods (phylogenetic motifs, two methods that identify subfamily-specific positions, and three different conservation scores). A canonical set of 34 functional sites identified by Yamashita et al. within the recently solved LeuTAa structure is used to assess the quality of the predictions, most of which are predicted by the bioinformatic methods. Remarkably, the importance of these sites is largely confirmed by experimental mutagenesis. Furthermore, the collective set of functional site predictions qualitatively clusters along the proposed transport pathway, further demonstrating their utility. Interestingly, the various prediction schemes provide results that are predominantly orthogonal to each other. However, when the methods do provide overlapping results, specificity is shown to increase dramatically (e.g., sites predicted by any three methods have both accuracy and coverage greater than 50%).
The results presented herein clearly establish the viability of sequence-based bioinformatic strategies to provide functional insight within the NSS family. As such, we expect similar bioinformatic investigations will streamline functional investigations within membrane integral families in the absence of structure.
从球状蛋白质预测功能位点的研究日益普遍;然而,这些方法中最成功的通常需要结构信息。不幸的是,尽管最近有多项技术进展,但膜整合蛋白的结构覆盖率仍然很低。因此,基于序列的方法成为揭示功能作用的重要替代方法。在本报告中,我们批判性地研究了几种计算方法在两个特定领域提供功能见解的能力。第一,系统发育基因组学方法能否准确描述膜整合蛋白家族的功能多样性?第二,基于序列的策略能否准确预测关键功能位点?由于存在最近解析的结构和大量实验诱变数据,神经递质/Na+共转运蛋白(NSS)家族是评估我们预测质量的理想模型系统。
原始的NSS序列数据集包含181个序列,已通过各种方法进行了比对。所得的系统发育树始终包含六个主要亚家族,与整个家族的功能多样性一致。此外,在代表性良好的亚家族中,系统发育聚类概括了几个细微的功能差异。使用六种不同的方法(系统发育基序、两种识别亚家族特异性位置的方法以及三种不同的保守性评分)预测功能位点。山下等人在最近解析的亮氨酸转运蛋白A(LeuTAa)结构中确定的一组34个典型功能位点用于评估预测质量,其中大多数由生物信息学方法预测。值得注意的是,这些位点的重要性在很大程度上得到了实验诱变的证实。此外,功能位点预测的集合沿着提议的转运途径进行定性聚类,进一步证明了它们的实用性。有趣的是,各种预测方案提供的结果主要是相互正交的。然而,当这些方法确实提供重叠结果时,特异性会显著提高(例如,由任意三种方法预测的位点的准确性和覆盖率均大于50%)。
本文给出的结果清楚地证明了基于序列的生物信息学策略在NSS家族中提供功能见解的可行性。因此,我们预计在缺乏结构信息的情况下,类似的生物信息学研究将简化膜整合蛋白家族的功能研究。