Wang Chuan, Dong Xiaobao, Han Lei, Su Xiao-Dong, Zhang Ziding, Li Jinyan, Song Jiangning
State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China; Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA.
State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China.
J Theor Biol. 2016 Jun 7;398:122-9. doi: 10.1016/j.jtbi.2016.03.025. Epub 2016 Mar 25.
A WD40 protein typically contains four or more repeats of ~40 residues ended with the Trp-Asp dipeptide, which folds into β-propellers with four β strands in each repeat. They often function as scaffolds for protein-protein interactions and are involved in numerous fundamental biological processes. Despite their important functional role, the "velcro" closure of WD40 propellers and the diversity of WD40 repeats make their identification a difficult task. Here we develop a new WD40 Repeat Recognition method (WDRR), which uses predicted secondary structure information to generate candidate repeat segments, and further employs a profile-profile alignment to identify the correct WD40 repeats from candidate segments. In particular, we design a novel alignment scoring function that combines dot product and BLOSUM62, thereby achieving a great balance of sensitivity and accuracy. Taking advantage of these strategies, WDRR could effectively reduce the false positive rate and accurately identify more remote homologous WD40 repeats with precise repeat boundaries. We further use WDRR to re-annotate the Pfam families in the β-propeller clan (CL0186) and identify a number of WD40 repeat proteins with high confidence across nine model organisms. The WDRR web server and the datasets are available at http://protein.cau.edu.cn/wdrr/.
WD40蛋白通常包含四个或更多由色氨酸-天冬氨酸二肽结尾的约40个残基的重复序列,每个重复序列折叠成含有四条β链的β-螺旋桨结构。它们常作为蛋白质-蛋白质相互作用的支架,并参与众多基本生物学过程。尽管它们具有重要的功能作用,但WD40螺旋桨的“维可牢”式封闭结构以及WD40重复序列的多样性使得它们的识别成为一项艰巨任务。在此,我们开发了一种新的WD40重复序列识别方法(WDRR),该方法利用预测的二级结构信息生成候选重复片段,并进一步采用profile-profile比对从候选片段中识别正确的WD40重复序列。特别地,我们设计了一种新颖的比对评分函数,它结合了点积和BLOSUM62,从而在灵敏度和准确性之间实现了良好的平衡。利用这些策略,WDRR能够有效降低假阳性率,并准确识别出具有精确重复边界的更多远源同源WD40重复序列。我们进一步使用WDRR对β-螺旋桨家族(CL0186)中的Pfam家族进行重新注释,并在九种模式生物中高可信度地鉴定出许多WD40重复蛋白。WDRR网络服务器和数据集可在http://protein.cau.edu.cn/wdrr/获取。