Jablonska Jagoda, Matelska Dorota, Steczkiewicz Kamil, Ginalski Krzysztof
Laboratory of Bioinformatics and Systems Biology, Centre of New Technologies, University of Warsaw, Zwirki i Wigury 93, 02-089 Warsaw, Poland.
Nucleic Acids Res. 2017 Nov 16;45(20):11479-11494. doi: 10.1093/nar/gkx924.
The His-Me finger endonucleases, also known as HNH or ββα-metal endonucleases, form a large and diverse protein superfamily. The His-Me finger domain can be found in proteins that play an essential role in cells, including genome maintenance, intron homing, host defense and target offense. Its overall structural compactness and non-specificity make it a perfectly-tailored pathogenic module that participates on both sides of inter- and intra-organismal competition. An extremely low sequence similarity across the superfamily makes it difficult to identify and classify new His-Me fingers. Using state-of-the-art distant homology detection methods, we provide an updated and systematic classification of His-Me finger proteins. In this work, we identified over 100 000 proteins and clustered them into 38 groups, of which three groups are new and cannot be found in any existing public domain database of protein families. Based on an analysis of sequences, structures, domain architectures, and genomic contexts, we provide a careful functional annotation of the poorly characterized members of this superfamily. Our results may inspire further experimental investigations that should address the predicted activity and clarify the potential substrates, to provide more detailed insights into the fundamental biological roles of these proteins.
组氨酸-甲硫氨酸指状核酸内切酶,也被称为HNH或ββα-金属核酸内切酶,构成了一个庞大且多样的蛋白质超家族。组氨酸-甲硫氨酸指状结构域存在于在细胞中发挥重要作用的蛋白质中,包括基因组维护、内含子归巢、宿主防御和靶向攻击。其整体结构的紧凑性和非特异性使其成为一个完美定制的致病模块,参与生物体间和生物体内竞争的双方。整个超家族的序列相似性极低,这使得鉴定和分类新的组氨酸-甲硫氨酸指状结构变得困难。我们使用最先进的远源同源性检测方法,对组氨酸-甲硫氨酸指状蛋白进行了更新和系统的分类。在这项工作中,我们鉴定了超过100000种蛋白质,并将它们聚类为38个组,其中有三个组是新的,在任何现有的公共蛋白质家族数据库中都找不到。基于对序列、结构、结构域架构和基因组背景的分析,我们对这个超家族中特征描述较少的成员进行了细致的功能注释。我们的结果可能会激发进一步的实验研究,这些研究应该针对预测的活性并阐明潜在的底物,以便更详细地了解这些蛋白质的基本生物学作用。