Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland, Department of Molecular Physiology and Biological Physics, University of Virginia, 1340 Jefferson Park Avenue, Charlottesville, VA USA-22908, USA, Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Umultowska 89, PL-61-614 Poznan, Poland, Laboratory of Bioinformatics and Systems Biology, Centre of New Technologies, University of Warsaw, Zwirki i Wigury 93, PL-02-089 Warsaw, Poland, Institute of Biochemistry and Biophysics PAS, Pawinskiego 5A, PL-02-106 Warsaw, Poland and Laboratory of Protein Structure, International Institute of Molecular and Cell Biology, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland.
Nucleic Acids Res. 2014 Apr;42(7):4160-79. doi: 10.1093/nar/gkt1414. Epub 2014 Jan 23.
Ribonuclease H-like (RNHL) superfamily, also called the retroviral integrase superfamily, groups together numerous enzymes involved in nucleic acid metabolism and implicated in many biological processes, including replication, homologous recombination, DNA repair, transposition and RNA interference. The RNHL superfamily proteins show extensive divergence of sequences and structures. We conducted database searches to identify members of the RNHL superfamily (including those previously unknown), yielding >60 000 unique domain sequences. Our analysis led to the identification of new RNHL superfamily members, such as RRXRR (PF14239), DUF460 (PF04312, COG2433), DUF3010 (PF11215), DUF429 (PF04250 and COG2410, COG4328, COG4923), DUF1092 (PF06485), COG5558, OrfB_IS605 (PF01385, COG0675) and Peptidase_A17 (PF05380). Based on the clustering analysis we grouped all identified RNHL domain sequences into 152 families. Phylogenetic studies revealed relationships between these families, and suggested a possible history of the evolution of RNHL fold and its active site. Our results revealed clear division of the RNHL superfamily into exonucleases and endonucleases. Structural analyses of features characteristic for particular groups revealed a correlation between the orientation of the C-terminal helix with the exonuclease/endonuclease function and the architecture of the active site. Our analysis provides a comprehensive picture of sequence-structure-function relationships in the RNHL superfamily that may guide functional studies of the previously uncharacterized protein families.
核糖核酸酶 H 样(RNHL)超家族,也称为逆转录病毒整合酶超家族,汇集了许多参与核酸代谢的酶,并涉及许多生物学过程,包括复制、同源重组、DNA 修复、转座和 RNA 干扰。RNHL 超家族蛋白的序列和结构广泛发散。我们进行数据库搜索以鉴定 RNHL 超家族的成员(包括以前未知的成员),产生了 >60000 个独特的结构域序列。我们的分析导致了新的 RNHL 超家族成员的鉴定,例如 RRXRR(PF14239)、DUF460(PF04312、COG2433)、DUF3010(PF11215)、DUF429(PF04250 和 COG2410、COG4328、COG4923)、DUF1092(PF06485)、COG5558、OrfB_IS605(PF01385、COG0675)和 Peptidase_A17(PF05380)。基于聚类分析,我们将所有鉴定的 RNHL 结构域序列分为 152 个家族。系统发育研究揭示了这些家族之间的关系,并提示了 RNHL 折叠及其活性位点进化的可能历史。我们的结果表明,RNHL 超家族明显分为外切核酸酶和内切核酸酶。对特定组特征的结构分析揭示了 C 末端螺旋的方向与外切核酸酶/内切核酸酶功能以及活性位点的结构之间的相关性。我们的分析提供了 RNHL 超家族中序列-结构-功能关系的全面描述,这可能指导以前未表征的蛋白质家族的功能研究。