Wu Jun, Helftenbein Gerd, Koslowski Michael, Sahin Ugur, Tureci Ozlem
Ganymed-Pharmaceuticlas AG, Freiligrathstrasse 12, 55131 Mainz, Germany.
Proteins. 2006 Dec 1;65(4):808-15. doi: 10.1002/prot.21218.
In an attempt to develop a novel strategy for the identification of new members of protein families by in silico approaches, we have developed a semi-automated procedure of consecutive PSI-BLAST (Position-Specific-Iterated Basic Local Alignment Search Tool) searches incorporating identificiation as well as subsequent validation of putative candidates. For a proof of concept study we chose the search for novel members of the claudin family. The initial step was an iterated PSI-BLAST search starting with the PMP22_Claudin domain of each known member of the claudin family against the human part of the RefSeq Database. Putative new claudin domains derived from the converged list were evaluated by a validating PSI-BLAST in which each sequence was assessed for finding back the starting set of known claudin domains. The local PSI-BLAST searches and validation were automated by a set of PERL scripts. With this strategy a total of three additional putative claudin domains in three different proteins were identified. One of them was subjected to further characterization and was shown to exhibit claudin-like features in terms of protein structure and expression pattern. The strategy we present is an efficient and versatile tool to identify novel members of domain-sharing protein families. Low rates of false positives achieved by inclusion of a validation step into the in silico procedure make this strategy particularly attractive to select candidates for subsequent labor-intensive wet bench characterization.
为了开发一种通过计算机方法识别蛋白质家族新成员的新策略,我们开发了一种半自动化程序,即连续进行PSI-BLAST(位置特异性迭代基本局部比对搜索工具)搜索,其中包括对假定候选物的识别以及后续验证。作为概念验证研究,我们选择搜索claudin家族的新成员。第一步是进行迭代PSI-BLAST搜索,从claudin家族每个已知成员的PMP22_Claudin结构域开始,针对RefSeq数据库的人类部分进行搜索。通过验证性PSI-BLAST对来自收敛列表的假定新claudin结构域进行评估,在该验证性PSI-BLAST中,评估每个序列以找回已知claudin结构域的起始集。局部PSI-BLAST搜索和验证由一组PERL脚本自动执行。通过这种策略,在三种不同的蛋白质中总共鉴定出另外三个假定的claudin结构域。其中一个进行了进一步表征,并在蛋白质结构和表达模式方面显示出类似claudin的特征。我们提出的策略是识别结构域共享蛋白质家族新成员的一种有效且通用的工具。通过在计算机程序中纳入验证步骤实现的低误报率,使得该策略对于选择后续进行劳动密集型实验台表征的候选物特别有吸引力。